arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1447
专题追踪
2512.10350 2026-02-02 cs.LG cs.AI

Geometric Dynamics of Agentic Loops in Large Language Models

Nicolas Tacheny

详情
英文摘要

Iterative LLM systems(self-refinement, chain-of-thought, autonomous agents) are increasingly deployed, yet their temporal dynamics remain uncharacterized. Prior work evaluates task performance at convergence but ignores the trajectory: how does semantic content evolve across iterations? Does it stabilize, drift, or oscillate? Without answering these questions, we cannot predict system behavior, guarantee stability, or systematically design iterative architectures. We formalize agentic loops as discrete dynamical systems in semantic space. Borrowing from dynamical systems theory, we define trajectories, attractors and dynamical regimes for recursive LLM transformations, providing rigorous geometric definitions adapted to this setting. Our framework reveals that agentic loops exhibit classifiable dynamics: contractive (convergence toward stable semantic attractors), oscillatory (cycling among attractors), or exploratory (unbounded divergence). Experiments on singular loops validate the framework. Iterative paraphrasing produces contractive dynamics with measurable attractor formation and decreasing dispersion. Iterative negation produces exploratory dynamics with no stable structure. Crucially, prompt design directly controls the dynamical regime - the same model exhibits fundamentally different geometric behaviors depending solely on the transformation applied. This work establishes that iterative LLM dynamics are predictable and controllable, opening new directions for stability analysis, trajectory forecasting, and principled design of composite loops that balance convergence and exploration.

2512.07287 2026-02-02 cs.LG cs.AI

Experience-Evolving Multi-Turn Tool-Use Agent with Hybrid Episodic-Procedural Memory

Sijia Li, Yuchen Huang, Zifan Liu, Zijian Li, Jingjing fu, Lei Song, Jiang Bian, Jun Zhang, Rui Wang

详情
英文摘要

As intents unfold and environments change, multi-turn agents face continuously shifting decision contexts. Although reusing past experience is intuitively appealing, existing approaches remain limited: full trajectories are often too context-specific to transfer, while tool-level reuse ignores the surrounding context and environment. In this paper, we introduce a hybrid episodic-procedural memory strategy (H-EPM) that enables experience-induced self-evolution of multi-turn tool-use policies by adaptively reusing partially overlapping successful experiences during both inference and training. Inspired by human episodic-procedural integration, we construct a tool graph from accumulated trajectories, where recurring tool-to-tool dependencies capture procedural routines and each edge is augmented with compact episodic summaries of relevant context. At inference time, the agent dynamically balances episodic recall for contextual reasoning with procedural execution for routine steps. Beyond inference, H-EPM introduces a memory-guided reinforcement learning paradigm that directly addresses a core challenge in multi-turn agent reinforcement learning, namely ineffective exploration over long trajectories. By biasing exploration toward historically successful tool transitions, H-EPM learns a stronger policy that generalizes at inference time without relying on domain-specific experience collection. Experiments show that H-EPM consistently delivers substantial inference-time gains over strong baselines across multi-turn tool-use benchmarks, reaching improvements of up to fifty percent. It also improves reinforcement learning policy performance, achieving gains of up to forty percent on out-of-distribution tasks.

2512.06746 2026-02-02 cs.CV cs.AI

AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model Alignment

Ruoxin Chen, Jiahui Gao, Kaiqing Lin, Keyue Zhang, Yandan Zhao, Isabel Guan, Taiping Yao, Shouhong Ding

详情
英文摘要

Vision Language Models (VLMs) are increasingly used for detecting AI-generated images (AIGI). However, converting VLMs into reliable detectors is resource-intensive, and the resulting models often suffer from hallucination and poor generalization. To investigate the root cause, we conduct an empirical analysis and identify two consistent behaviors. First, fine-tuning VLMs with semantic supervision improves semantic discrimination and generalizes well to unseen data. Second, fine-tuning VLMs with pixel-artifact supervision leads to weak generalization. These findings reveal a fundamental task-model misalignment. VLMs are optimized for high-level semantic reasoning and lack inductive bias toward low-level pixel artifacts. In contrast, conventional vision models effectively capture pixel-level artifacts but are less sensitive to semantic inconsistencies. This indicates that different models are naturally suited to different subtasks. Based on this insight, we formulate AIGI detection as two orthogonal subtasks: semantic consistency checking and pixel-artifact detection. Neglecting either subtask leads to systematic detection failures. We further propose the Task-Model Alignment principle and instantiate it in a two-branch detector, AlignGemini. The detector combines a VLM trained with pure semantic supervision and a vision model trained with pure pixel-artifact supervision. By enforcing clear specialization, each branch captures complementary cues. Experiments on in-the-wild benchmarks show that AlignGemini improves average accuracy by 9.5 percent using simplified training data. These results demonstrate that task-model alignment is an effective principle for generalizable AIGI detection.

2512.06013 2026-02-02 cs.CV cs.RO

VAT: Vision Action Transformer by Unlocking Full Representation of ViT

Wenhao Li, Chengwei Ma, Weixin Mao

详情
英文摘要

In robot learning, Vision Transformers (ViTs) are standard for visual perception, yet most methods discard valuable information by using only the final layer's features. We argue this provides an insufficient representation and propose the Vision Action Transformer (VAT), a novel architecture that is extended from ViT and unlocks the full feature hierarchy of ViT. VAT processes specialized action tokens with visual features across all transformer layers, enabling a deep and progressive fusion of perception and action generation. On a suite of simulated manipulation tasks, VAT achieves a 98.15\% average success rate across four LIBERO benchmarks, establishing a new state-of-the-art by outperforming prior methods like OpenVLA-OFT. Our work presents not only a powerful model for imitation learning but also demonstrates the critical importance of leveraging the complete ''representation trajectory'' of vision models to advance robotic policy. The GitHub URL for the project code is https://github.com/sellerbubble/VAT.

2512.03393 2026-02-02 cs.LG stat.ML

Tuning-Free Structured Sparse Recovery of Multiple Measurement Vectors using Implicit Regularization

Lakshmi Jayalal, Sheetal Kalyani

详情
英文摘要

Recovering jointly sparse signals in the multiple measurement vectors (MMV) setting is a fundamental problem in machine learning, but traditional methods often require careful parameter tuning or prior knowledge of the sparsity of the signal and/or noise variance. We propose a tuning-free framework that leverages implicit regularization (IR) from overparameterization to overcome this limitation. Our approach reparameterizes the estimation matrix into factors that decouple the shared row-support from individual vector entries and applies gradient descent to a standard least-squares objective. We prove that with a sufficiently small and balanced initialization, the optimization dynamics exhibit a "momentum-like" effect where the true support grows significantly faster. Leveraging a Lyapunov-based analysis of the gradient flow, we further establish formal guarantees that the solution trajectory converges towards an idealized row-sparse solution. Empirical results demonstrate that our tuning-free approach achieves performance comparable to optimally tuned established methods. Furthermore, our framework significantly outperforms these baselines in scenarios where accurate priors are unavailable to the baselines.

2511.20275 2026-02-02 cs.RO

HAFO: A Force-Adaptive Control Framework for Humanoid Robots in Intense Interaction Environments

Chenhui Dong, Haozhe Xu, Wenhao Feng, Zhipeng Wang, Yanmin Zhou, Yifei Zhao, Bin He

详情
英文摘要

Reinforcement learning (RL) controllers have made impressive progress in humanoid locomotion and light-weight object manipulation. However, achieving robust and precise motion control with intense force interaction remains a significant challenge. To address these limitations, this paper proposes HAFO, a dual-agent reinforcement learning framework that concurrently optimizes both a robust locomotion strategy and a precise upper-body manipulation strategy via coupled training. We employ a constrained residual action space to improve dual-agent training stability and sample efficiency. The external tension disturbances are explicitly modeled using a spring-damper system, allowing for fine-grained force control through manipulation of the virtual spring. In this process, the reinforcement learning policy autonomously generates a disturbance-rejection response by utilizing environmental feedback. The experimental results demonstrate that HAFO achieves whole-body control for humanoid robots across diverse force-interaction environments using a single dual-agent policy, delivering outstanding performance under load-bearing and thrust-disturbance conditions, while maintaining stable operation even under rope suspension state.

2511.18616 2026-02-02 cs.CL

What Helps Language Models Predict Human Beliefs: Demographics or Prior Stances?

Joseph Malone, Rachith Aiyappa, Byunghwee Lee, Haewoon Kwak, Jisun An, Yong-Yeol Ahn

Comments 32 pages, 6 figures

详情
英文摘要

Beliefs shape how people reason, communicate, and behave. Rather than existing in isolation, they exhibit a rich correlational structure--some connected through logical dependencies, others through indirect associations or social processes. As usage of large language models (LLMs) becomes more ubiquitous in our society, LLMs' ability to understand and reason through human beliefs has many implications from privacy issues to personalized persuasion and the potential for stereotyping. Yet how LLMs capture this interrelated landscape of beliefs remains unclear. For instance, when predicting someone's beliefs, what information affects the prediction most--who they are (demographics), what else they believe (prior stances), or a combination of both? We address these questions using data from an online debate platform, evaluating the ability of off-the-shelf open-weight LLMs to predict individuals' stance under four conditions: no context, demographics only, prior beliefs only, and both combined. We find that both types of information improve predictions over a blind baseline, with their combination yielding the best performance in most cases. However, the relative value of each varies substantially across belief domains. These findings reveal how current LLMs leverage different types of social information when reasoning about human beliefs, highlighting both their capabilities and limitations.

2511.17958 2026-02-02 cs.CV

HEAL: Learning-Free Source Free Unsupervised Domain Adaptation for Cross-Modality Medical Image Segmentation

Yulong Shi, Jiapeng Li, Lin Qi

Comments Accepted by The 36th British Machine Vision Conference (BMVC 2025)

Journal ref 36th British Machine Vision Conference (BMVC2025)

详情
英文摘要

Growing demands for clinical data privacy and storage constraints have spurred advances in Source Free Unsupervised Domain Adaptation (SFUDA). SFUDA addresses the domain shift by adapting models from the source domain to the unseen target domain without accessing source data, even when target-domain labels are unavailable. However, SFUDA faces significant challenges: the absence of source domain data and label supervision in the target domain due to source free and unsupervised settings. To address these issues, we propose HEAL, a novel SFUDA framework that integrates Hierarchical denoising, Edge-guided selection, size-Aware fusion, and Learning-free characteristic. Large-scale cross-modality experiments demonstrate that our method outperforms existing SFUDA approaches, achieving state-of-the-art (SOTA) performance. The source code is publicly available at: https://github.com/derekshiii/HEAL.

2511.14109 2026-02-02 cs.CV

A2GC: Asymmetric Aggregation with Geometric Constraints for Locally Aggregated Descriptors

Zhenyu Li, Tianyi Shang

Comments 8 pages, 4figures

详情
英文摘要

Visual Place Recognition (VPR) aims to match query images against a database using visual cues. State-of-the-art methods aggregate features from deep backbones to form global descriptors. Optimal transport-based aggregation methods reformulate feature-to-cluster assignment as a transport problem, but the standard Sinkhorn algorithm symmetrically treats source and target marginals, limiting effectiveness when image features and cluster centers exhibit substantially different distributions. We propose an asymmetric aggregation VPR method with geometric constraints for locally aggregated descriptors, called $A^2$GC-VPR. Our method employs row-column normalization averaging with separate marginal calibration, enabling asymmetric matching that adapts to distributional discrepancies in visual place recognition. Geometric constraints are incorporated through learnable coordinate embeddings, computing compatibility scores fused with feature similarities, thereby promoting spatially proximal features to the same cluster and enhancing spatial awareness. Experimental results on MSLS, NordLand, and Pittsburgh datasets demonstrate superior performance, validating the effectiveness of our approach in improving matching accuracy and robustness.

2511.10691 2026-02-02 cs.CL cs.AI

Evaluating from Benign to Dynamic Adversarial: A Squid Game for Large Language Models

Zijian Chen, Wenjun Zhang, Guangtao Zhai

Comments 31 pages, 15 figures

详情
英文摘要

The potential data contamination issue in contemporary large language models (LLMs) benchmarks presents a fundamental challenge to establishing trustworthy evaluation frameworks. Meanwhile, they predominantly assume benign, resource-rich settings, leaving the behavior of LLMs under pressure unexplored. In this paper, we introduce \textsc{Squid Game}, a dynamic and adversarial evaluation environment with resource-constrained and asymmetric information settings elaborated to evaluate LLMs through interactive gameplay against other LLM opponents. Squid Game consists of six elimination-style levels, focusing on multi-faceted abilities, including instruction-following, code, reasoning, planning, and safety alignment. We evaluate over 50 LLMs on Squid Game, presenting the largest behavioral evaluation study of general LLMs on dynamic adversarial scenarios. We observe a clear generational phase transition in performance in the same model lineage and find evidence that some models resort to speculative shortcuts to win the game, indicating the possibility of higher-level evaluation paradigm contamination in static benchmarks. We also compare prominent LLM benchmarks and \textsc{Squid Game}, highlighting that dynamic evaluation can serve as a complementary part for static evaluations. Project page: https://github.com/zijianchen98/LLM_Squid_Game.

2511.09529 2026-02-02 cs.LG

SiDGen: Structure-informed Diffusion for Generative modeling of Ligands for Proteins

Samyak Sanghvi, Nishant Ranjan, Tarak Karmakar

Comments 17 pages, 4 figures

详情
英文摘要

Structure-based drug design (SBDD) faces a fundamental scaling fidelity dilemma: rich pocket-aware conditioning captures interaction geometry but can be costly, often scales quadratically ($O(L^2)$) or worse with protein length ($L$), while efficient sequence-only conditioning can miss key interaction structure. We propose SiDGen, a structure-informed discrete diffusion framework that resolves this trade-off through a Topological Information Bottleneck (TIB). SiDGen leverages a learned, soft assignment mechanism to compress residue-level protein representations into a compact bottleneck enabling downstream pairwise computations on the coarse grid ($O(L^2/s^2)$). This design reduces memory and computational cost without compromising generative accuracy. Our approach achieves state-of-the-art performance on CrossDocked2020 and DUD-E benchmarks while significantly reducing pairwise-tensor memory. SiDGen bridges the gap between sequence-based efficiency and pocket-aware conditioning, offering a scalable path for high-throughput structure-based discovery.

2511.09139 2026-02-02 cs.CV

MACEval: A Multi-Agent Continual Evaluation Network for Large Models

Zijian Chen, Yuze Sun, Yuan Tian, Wenjun Zhang, Guangtao Zhai

Comments 32 pages, 14 figures

详情
英文摘要

Hundreds of benchmarks dedicated to evaluating large models have been presented over the past few years. However, most of them remain closed-ended and are prone to overfitting due to the potential data contamination. Moreover, the increasing scale and scope of current benchmarks with transient metrics, as well as the heavily human-dependent curation procedure, pose significant challenges for timely maintenance and adaptation. In this paper, we introduce MACEval, a Multi-Agent Continual Evaluation network for dynamic evaluation of large models, and define new metrics to quantify performance longitudinally. MACEval employs an interactive and autonomous evaluation mode, utilizing role assignment, in-process data generation, and evaluation routing through a cascaded agent network. Extensive experiments on 23 large models demonstrate the effectiveness of MACEval, which also lightens the evaluation process and reduces a considerable amount of overhead. We hope that MACEval can broaden future directions of large model evaluation. Project page: https://github.com/zijianchen98/MACEval.

2511.07994 2026-02-02 cs.AI

Enhancing Logical Expressiveness in Graph Neural Networks via Path-Neighbor Aggregation

Han Yu, Xiaojuan Zhao, Aiping Li, Kai Chen, Ziniu Liu, Zhichao Peng

Comments Accepted to AAAI 2026

详情
英文摘要

Graph neural networks (GNNs) can effectively model structural information of graphs, making them widely used in knowledge graph (KG) reasoning. However, existing studies on the expressive power of GNNs mainly focuses on simple single-relation graphs, and there is still insufficient discussion on the power of GNN to express logical rules in KGs. How to enhance the logical expressive power of GNNs is still a key issue. Motivated by this, we propose Path-Neighbor enhanced GNN (PN-GNN), a method to enhance the logical expressive power of GNN by aggregating node-neighbor embeddings on the reasoning path. First, we analyze the logical expressive power of existing GNN-based methods and point out the shortcomings of the expressive power of these methods. Then, we theoretically investigate the logical expressive power of PN-GNN, showing that it not only has strictly stronger expressive power than C-GNN but also that its $(k+1)$-hop logical expressiveness is strictly superior to that of $k$-hop. Finally, we evaluate the logical expressive power of PN-GNN on six synthetic datasets and two real-world datasets. Both theoretical analysis and extensive experiments confirm that PN-GNN enhances the expressive power of logical rules without compromising generalization, as evidenced by its competitive performance in KG reasoning tasks.

2511.07888 2026-02-02 cs.CL

Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification

Chenhao Dang, Jing Ma

Comments 9 pages,3 figures, AAAI 2026 Poster

详情
英文摘要

A persistent challenge in text classification (TC) is that enhancing model robustness against adversarial attacks typically degrades performance on clean data. We argue that this challenge can be resolved by modeling the distribution of clean samples in the encoder embedding manifold. To this end, we propose the Manifold-Correcting Causal Flow (MC^2F), a two-module system that operates directly on sentence embeddings. A Stratified Riemannian Continuous Normalizing Flow (SR-CNF) learns the density of the clean data manifold. It identifies out-of-distribution embeddings, which are then corrected by a Geodesic Purification Solver. This solver projects adversarial points back onto the learned manifold via the shortest path, restoring a clean, semantically coherent representation. We conducted extensive evaluations on text classification (TC) across three datasets and multiple adversarial attacks. The results demonstrate that our method, MC^2F, not only establishes a new state-of-the-art in adversarial robustness but also fully preserves performance on clean data, even yielding modest gains in accuracy.

2511.07222 2026-02-02 cs.CV

Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images

JiaKui Hu, Shanshan Zhao, Qing-Guo Chen, Xuerui Qiu, Jialun Liu, Zhao Xu, Weihua Luo, Kaifu Zhang, Yanye Lu

Comments Accepted by ICLR 2026

详情
英文摘要

This paper presents Omni-View, which extends the unified multimodal understanding and generation to 3D scenes based on multiview images, exploring the principle that "generation facilitates understanding". Consisting of understanding model, texture module, and geometry module, Omni-View jointly models scene understanding, novel view synthesis, and geometry estimation, enabling synergistic interaction between 3D scene understanding and generation tasks. By design, it leverages the spatiotemporal modeling capabilities of its texture module responsible for appearance synthesis, alongside the explicit geometric constraints provided by its dedicated geometry module, thereby enriching the model's holistic understanding of 3D scenes. Trained with a two-stage strategy, Omni-View achieves a state-of-the-art score of 55.4 on the VSI-Bench benchmark, outperforming existing specialized 3D understanding models, while simultaneously delivering strong performance in both novel view synthesis and 3D scene generation. The code and pretraiend models are open-sourced at https://github.com/AIDC-AI/Omni-View.

2511.02767 2026-02-02 cs.CV

Dynamic Reflections: Probing Video Representations with Text Alignment

Tyler Zhu, Tengda Han, Leonidas Guibas, Viorica Pătrăucean, Maks Ovsjanikov

Comments To appear at ICLR 2026. 27 pages, 12 figures

详情
英文摘要

The alignment of representations from different modalities has recently been shown to provide insights on the structural similarities and downstream capabilities of different encoders across diverse data types. While significant progress has been made in aligning images with text, the temporal nature of video data remains largely unexplored in this context. In this work, we conduct the first comprehensive study of video-text representation alignment, probing the capabilities of modern video and language encoders. Our findings reveal several key insights. First, we demonstrate that cross-modal alignment highly depends on the richness of both visual (static images vs. multi-frame videos) and text (single caption vs. a collection) data provided at test time, especially when using state-of-the-art video encoders. We propose parametric test-time scaling laws that capture this behavior and show remarkable predictive power against empirical observations. Secondly, we investigate the correlation between semantic alignment and performance on both semantic and non-semantic downstream tasks, providing initial evidence that strong alignment against text encoders may be linked to general-purpose video representation and understanding. Finally, we correlate temporal reasoning with cross-modal alignment providing a challenging test-bed for vision and language models. Overall, our work introduces video-text alignment as an informative zero-shot way to probe the representation power of different encoders for spatio-temporal data. Project page can be found at https://video-prh.github.io/

2511.00655 2026-02-02 cs.LG

Reviving Stale Updates: Data-Free Knowledge Distillation for Asynchronous Federated Learning

Baris Askin, Holger R. Roth, Zhenyu Sun, Carlee Joe-Wong, Gauri Joshi, Ziyue Xu

详情
英文摘要

Federated learning (FL) enables collaborative model training across distributed clients without sharing raw data, yet its scalability is limited by synchronization overhead. Asynchronous federated learning (AFL) alleviates this issue by allowing clients to communicate independently, thereby improving wall-clock efficiency in large-scale, hardware-heterogeneous environments. However, asynchrony introduces updates computed on outdated global models (staleness) that can destabilize optimization and hinder convergence. We propose FedRevive, an AFL framework that revives stale updates through data-free knowledge distillation (DFKD). FedRevive integrates parameter-space aggregation with a lightweight, server-side DFKD process that transfers knowledge from stale client updates to the current global model without access to data. A meta-learned generator synthesizes pseudo-samples used for multi-teacher distillation. A hybrid aggregation scheme that combines raw with DFKD updates effectively mitigates staleness while retaining AFL scalability. Experiments on various vision and text benchmarks show that FedRevive achieves faster training by up to 38.4% and higher final accuracy by up to 16.5% than asynchronous baselines.

2511.00067 2026-02-02 cs.LG cs.AI

Latent Domain Prompt Learning for Vision-Language Models

Zhixing Li, Arsham Gholamzadeh Khoee, Yinan Yu

Comments Accepted to ICASSP 2026

详情
英文摘要

The objective of domain generalization (DG) is to enable models to be robust against domain shift. DG is crucial for deploying vision-language models (VLMs) in real-world applications, yet most existing methods rely on domain labels that may not be available and often ambiguous. We instead study the DG setting where models must generalize well without access to explicit domain labels. Our key idea is to represent an unseen target domain as a combination of latent domains automatically discovered from training data, enabling the model to adaptively transfer knowledge across domains. To realize this, we perform latent domain clustering on image features and fuse domain-specific text features based on the similarity between the input image and each latent domain. Experiments on four benchmarks show that this strategy yields consistent gains over VLM-based baselines and provides new insights into improving robustness under domain shift.

2510.27410 2026-02-02 cs.AI

Closing the Expression Gap in LLM Instructions via Socratic Questioning

Jianwen Sun, Yukang Feng, Yifan Chang, Chuanhao Li, Zizhen Li, Jiaxin Ai, Fanrui Zhang, Yu Dai, Kaipeng Zhang

详情
英文摘要

A fundamental bottleneck in human-AI collaboration is the ``intention expression gap," the difficulty for humans to effectively convey complex, high-dimensional thoughts to AI. This challenge often traps users in inefficient trial-and-error loops and is exacerbated by the diverse expertise levels of users. We reframe this problem from passive instruction following to a Socratic collaboration paradigm, proposing an agent that actively probes for information to resolve its uncertainty about user intent. we name the proposed agent Nous, trained to acquire proficiency in this inquiry policy. The core mechanism of Nous is a training framework grounded in the first principles of information theory. Within this framework, we define the information gain from dialogue as an intrinsic reward signal, which is fundamentally equivalent to the reduction of Shannon entropy over a structured task space. This reward design enables us to avoid reliance on costly human preference annotations or external reward models. To validate our framework, we develop an automated simulation pipeline to generate a large-scale, preference-based dataset for the challenging task of scientific diagram generation. Comprehensive experiments, including ablations, subjective and objective evaluations, and tests across user expertise levels, demonstrate the effectiveness of our proposed framework. Nous achieves leading efficiency and output quality, while remaining robust to varying user expertise. In conclusion, our research provides a systematic methodology and a new perspective for addressing the issue of ambiguous intentions in complex human-machine collaboration.

2510.22984 2026-02-02 cs.LG cs.NE

Equivariant Neural Networks for General Linear Symmetries on Lie Algebras

Chankyo Kim, Sicheng Zhao, Minghan Zhu, Tzu-Yuan Lin, Maani Ghaffari

Comments 36 pages, 8 figures

详情
英文摘要

Many scientific and geometric problems exhibit general linear symmetries, yet most equivariant neural networks are built for compact groups or simple vector features, limiting their reuse on matrix-valued data such as covariances, inertias, or shape tensors. We introduce Reductive Lie Neurons (ReLNs), an exactly GL(n)-equivariant architecture that natively supports matrix-valued and Lie-algebraic features. ReLNs resolve a central stability issue for reductive Lie algebras by introducing a non-degenerate adjoint (conjugation)-invariant bilinear form, enabling principled nonlinear interactions and invariant feature construction in a single architecture that transfers across subgroups without redesign. We demonstrate ReLNs on algebraic tasks with sl(3) and sp(4) symmetries, Lorentz-equivariant particle physics, uncertainty-aware drone state estimation via joint velocity-covariance processing, learning from 3D Gaussian-splat representations, and EMLP double-pendulum benchmark spanning multiple symmetry groups. ReLNs consistently match or outperform strong equivariant and self-supervised baselines while using substantially fewer parameters and compute, improving the accuracy-efficiency trade-off and providing a practical, reusable backbone for learning with broad linear symmetries. Project page: https://reductive-lie-neuron.github.io/

2510.22042 2026-02-02 cs.CL cs.AI

Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models

Benjamin Reichman, Adar Avsian, Larry Heck

详情
英文摘要

This work investigates how large language models (LLMs) internally represent emotion by analyzing the geometry of their hidden-state space. The paper identifies a low-dimensional emotional manifold and shows that emotional representations are directionally encoded, distributed across layers, and aligned with interpretable dimensions. These structures are stable across depth and generalize to eight real-world emotion datasets spanning five languages. Cross-domain alignment yields low error and strong linear probe performance, indicating a universal emotional subspace. Within this space, internal emotion perception can be steered while preserving semantics using a learned intervention module, with especially strong control for basic emotions across languages. These findings reveal a consistent and manipulable affective geometry in LLMs and offer insight into how they internalize and process emotion.

2510.21800 2026-02-02 cs.LG math.OC stat.ML

MARS-M: When Variance Reduction Meets Matrices

Yifeng Liu, Angela Yuan, Quanquan Gu

详情
英文摘要

Matrix-based preconditioned optimizers, such as Muon, have recently been shown to be more efficient than scalar-based optimizers for training large-scale neural networks, including large language models (LLMs). Recent benchmark studies of LLM pretraining optimizers have demonstrated that variance-reduction techniques such as MARS can substantially speed up training compared with standard optimizers that do not employ variance reduction. In this paper, we introduce MARS-M, a new optimizer that integrates MARS-style variance reduction with Muon. Under standard regularity conditions, we prove that MARS-M converges to a first-order stationary point at a rate of $\tilde{\mathcal{O}}(T^{-1/3})$, improving upon the $\tilde{\mathcal{O}}(T^{-1/4})$ rate attained by Muon. Empirical results on language modeling and computer vision tasks demonstrate that MARS-M consistently yields lower losses and improved performance across various downstream benchmarks. The implementation of MARS-M is available at https://github.com/AGI-Arena/MARS/tree/main/MARS_M.

2510.15522 2026-02-02 cs.CL

LLM Latent Reasoning as Chain of Superposition

Jingcheng Deng, Liang Pang, Zihao Wei, Shicheng Xu, Zenghao Duan, Kun Xu, Yang Song, Huawei Shen, Xueqi Cheng

详情
英文摘要

Latent reasoning offers a computation-efficient alternative to Chain-of-Thought but often suffers from performance degradation due to distributional misalignment and ambiguous chain definitions. Ideally, latent reasoning should function as a superposition of multiple reasoning paths. To realize this, we introduce Latent-SFT, a unified framework addressing challenges at three levels: token, chain, and learning. First, we define the Latent-Vocab to constrain hidden states within the pre-trained vocab-space. Second, we construct the Latent-Chain via Induction-Supervision Masking to ensure semantic compactness and sufficiency. Third, we employ Latent-Optim with stochastic Gumbel-Softmax to guide the model toward generalizable solutions. Empirical results demonstrate that Latent-SFT consistently outperforms explicit SFT across six mathematical benchmarks (e.g., GSM8k, AIME24) while achieving a 2.7x to 5.5x reduction in reasoning length. Analysis confirms that our method effectively captures a superposition of diverse reasoning trajectories rather than merely compressing a single path.

2510.15336 2026-02-02 cs.RO cs.SY eess.SY

Adaptive Cost-Map-based Path Planning in Partially Unknown Environments with Movable Obstacles

Liviu-Mihai Stan, Ranulfo Bezerra, Shotaro Kojima, Tsige Tadesse Alemayoh, Satoshi Tadokoro, Masashi Konyo, Kazunori Ohno

详情
英文摘要

Reliable navigation in disaster-response and other unstructured indoor settings requires robots not only to avoid obstacles but also to recognise when those obstacles can be pushed aside. We present an adaptive, LiDAR and odometry-based path-planning framework that embeds this capability into the ROS2 Nav2 stack. A new Movable Obstacles Layer labels all LiDAR returns missing from a prior static map as tentatively movable and assigns a reduced traversal cost. A companion Slow-Pose Progress Checker monitors the ratio of commanded to actual velocity; when the robot slows appreciably, the local cost is raised from light to heavy, and on a stall to lethal, prompting the global planner to back out and re-route. Gazebo evaluations on a Scout Mini, spanning isolated objects and cluttered corridors, show higher goal-reach rates and fewer deadlocks than a no-layer baseline, with traversal times broadly comparable. Because the method relies only on planar scans and CPU-level computation, it suits resource-constrained search and rescue robots and integrates into heterogeneous platforms with minimal engineering. Overall, the results indicate that interaction-aware cost maps are a lightweight, ROS2-native extension for navigating among potentially movable obstacles in unstructured settings. The full implementation will be released as open source athttps://costmap-namo.github.io.

2510.14256 2026-02-02 cs.CV

Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning

Xiangyu Meng, Zixian Zhang, Zhenghao Zhang, Junchao Liao, Long Qin, Weizhi Wang

Comments Our project and code are available at https://ali-videoai.github.io/identity_page, https://github.com/alibaba/identity-grpo

详情
英文摘要

While advanced methods like VACE and Phantom have advanced video generation for specific subjects in diverse scenarios, they struggle with multi-human identity preservation in dynamic interactions, where consistent identities across multiple characters are critical. To address this, we propose Identity-GRPO, a human feedback-driven optimization pipeline for refining multi-human identity-preserving video generation. First, we construct a video reward model trained on a large-scale preference dataset containing human-annotated and synthetic distortion data, with pairwise annotations focused on maintaining human consistency throughout the video. We then employ a GRPO variant tailored for multi-human consistency, which greatly enhances both VACE and Phantom. Through extensive ablation studies, we evaluate the impact of annotation quality and design choices on policy optimization. Experiments show that Identity-GRPO achieves up to 18.9% improvement in human consistency metrics over baseline methods, offering actionable insights for aligning reinforcement learning with personalized video generation.

2510.11062 2026-02-02 cs.LG cs.MA

Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs

Yujie Zhao, Lanxiang Hu, Yang Wang, Minmin Hou, Hao Zhang, Ke Ding, Jishen Zhao

详情
英文摘要

Multi-agent systems (MAS) and reinforcement learning (RL) are widely used to enhance the agentic capabilities of large language models (LLMs). MAS improves task performance through role-based orchestration, while RL uses environmental rewards to learn stronger policies, such as GRPO-style optimization. However, applying on-policy RL to MAS remains underexplored and presents unique challenges. Algorithmically, standard GRPO grouping assumptions break down because prompts vary by role and by turn. System-wise, the training stack must support MAS-workflow rollouts and on-policy updates for both single-policy and multi-policy models. We propose AT-GRPO, which includes (i) an agent- and turn-wise grouped RL algorithm tailored to MAS and (ii) a training system that supports both single- and multi-policy regimes. Across game, planning, coding, and math tasks, AT-GRPO delivers substantial gains. On long-horizon planning, it increases accuracy from a 14.0 to 47.0 percent single-agent RL baseline to 96.0 to 99.5 percent. It also improves reasoning performance, with average gains of 3.87 to 7.62 percent on coding tasks and 9.0 to 17.93 percent on math. Code and environments are available at: https://github.com/pettingllms-ai/PettingLLMs.

2510.10909 2026-02-02 cs.AI

PaperArena: An Evaluation Benchmark for Tool-Augmented Agentic Reasoning on Scientific Literature

Daoyu Wang, Mingyue Cheng, Shuo Yu, Zirui Liu, Ze Guo, Xin Li, Qi Liu

Comments 19 pages, 13 figures

详情
英文摘要

Understanding and reasoning on the large-scale scientific literature is a crucial touchstone for large language model (LLM) based agents. However, existing works are mainly restricted to tool-free tasks within single papers, largely due to the lack of a benchmark that evaluates cross-paper reasoning and multi-tool orchestration in authentic research scenarios. In this work, we propose PaperArena, a benchmark to evaluate LLM-based agents on questions that require integrating information across multiple papers with the assistance of external tools. Given a research question, agents should formulate a reasoning plan, interact with multiple papers, and invoke appropriate tools to produce a well-grounded answer. To support standardized evaluation, we provide a platform for agent execution, offering a modular tool environment including multimodal parsing, context retrieval, and programmatic computation. Experiments reveal that even the leading LLM powering a well-established agentic workflow achieves merely 38.78% average accuracy, while on the hard subset, accuracy drops to only 18.47%. We also analyze reasoning traces and diagnose agent behavior, providing the community with insights to develop and evaluate more capable scientific agents.

2510.10197 2026-02-02 cs.AI

Don't Just Fine-tune the Agent, Tune the Environment

Siyuan Lu, Zechuan Wang, Hongxuan Zhang, Qintong Wu, Leilei Gan, Chenyi Zhuang, Jinjie Gu, Tao Lin

Comments ICLR 2026

详情
英文摘要

Large Language Model (LLM) agents show great promise for complex, multi-turn tool-use tasks, but their development is often hampered by the extreme scarcity of high-quality training data. Supervised fine-tuning (SFT) on synthetic data leads to overfitting, whereas standard reinforcement learning (RL) struggles with a critical cold-start problem and training instability. To address these challenges, we introduce $\textbf{Environment Tuning}$, a novel training paradigm that enables agents to learn complex behaviors directly from problem instances without relying on pre-collected expert trajectories. $\textbf{Environment Tuning}$ orchestrates this learning process through a structured curriculum, actionable environment augmentation that provides corrective feedback, and fine-grained progress rewards to ensure stable and efficient exploration. Using only 400 problem instances from Berkeley Function-Calling Leaderboard (BFCL) benchmark, our method not only achieves competitive in-distribution performance against strong baselines but also demonstrates superior out-of-distribution generalization, overcoming the performance collapse common to SFT-based approaches. Our work presents a paradigm shift from supervised fine-tuning on static trajectories to dynamic, environment-based exploration, paving the way for training more robust and data-efficient agents. The code is available at https://github.com/inclusionAI/AWorld-RL/tree/main/EnvTuning.

2510.08613 2026-02-02 cs.CL

GraphGhost: Tracing Structures Behind Large Language Models

Xinnan Dai, Xianxuan Long, Chung-Hsiang Lo, Kai Guo, Shenglai Zeng, Dongsheng Luo, Jiliang Tang

详情
英文摘要

Large Language Models (LLMs) exhibit strong reasoning capabilities on structured tasks, yet the internal mechanisms underlying such behaviors remain poorly understood. Existing interpretation methods mainly focus on token-level attributions, which provide limited insight into multi-step reasoning inside the model. We propose GraphGhost, a graph-based framework that models internal token interactions and neuron activations in LLMs as graphs. By aggregating token dependencies traced across layers, GraphGhost captures global information flow underlying model predictions. We formalize GraphGhost from two complementary perspectives: a sample view, which traces token dependencies for individual predictions, and a dataset view, which aggregates recurring structural patterns learned during training. Through graph analytics and quantitative experiments, we show that graph structural properties are closely associated with influential tokens and neuron nodes, and that perturbations to structurally critical nodes lead to measurable changes in reasoning behavior. These results indicate that the structural patterns captured by GraphGhost reflect meaningful internal organization of LLM reasoning. The codes are available at software part. Artifacts will be made available for research use only.

2510.08341 2026-02-02 cs.LG cs.AI

Post-Norm can Resharpen Attention

Pál Zsámboki, Benjamin Levi, David Ansel Josef Smith, Mitansh Kagalwala, Arlington Kell, Samuel Liechty, Cong Wang

Comments 17 pages, 7 figures, 1 table

详情
英文摘要

Length Generalization is the essential capacity of autonomous agents to perform tasks in longer contexts than those encountered during training. To systematically study this feat, we test how well models can approximate the next token distributions in algorithmic tasks. This is to take into account the realistic possibility of multiple next tokens being legal. We present a prototypical benchmark for this line of study: in the Set Complement Task, the model needs to output a uniform distribution over tokens not in the input. We prove a theorem that states simple transformers can length generalize on this task, however, with performance degradation due to attention dispersion. A mechanistic reading of how dispersion takes effect lets us discover a remedy: Post-Norm can Resharpen Attention. We present experimental evidence to support this idea. We also show that Exponential Moving Averages can help the issue of noisy gradients that arises when many next tokens are legal. We validate the general applicability of our proposed methods on a suite of formal language experiments. Our source code will be available upon publication.