arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1556
2510.02276 2026-02-25 cs.AI

BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals

Chenqi Li, Yu Liu, Timothy Denison, Tingting Zhu

详情
英文摘要

Biosignals offer valuable insights into the physiological states of the human body. Although biosignal modalities differ in functionality, signal fidelity, sensor comfort, and cost, they are often intercorrelated, reflecting the holistic and interconnected nature of human physiology. This opens up the possibility of performing the same tasks using alternative biosignal modalities, thereby improving the accessibility, usability, and adaptability of health monitoring systems. However, the limited availability of large labeled datasets presents challenges for training models tailored to specific tasks and modalities of interest. Unsupervised cross-modal knowledge transfer offers a promising solution by leveraging knowledge from an existing modality to support model training for a new modality. Existing methods are typically based on knowledge distillation, which requires running a teacher model alongside student model training, resulting in high computational and memory overhead. This challenge is further exacerbated by the recent development of foundation models that demonstrate superior performance and generalization across tasks at the cost of large model sizes. To this end, we explore a new framework for unsupervised cross-modal knowledge transfer of biosignals by training a lightweight bridge network to align the intermediate representations and enable information flow between foundation models and across modalities. Specifically, we introduce an efficient strategy for selecting alignment positions where the bridge should be constructed, along with a flexible prototype network as the bridge architecture. Extensive experiments across multiple biosignal modalities, tasks, and datasets show that BioX-Bridge reduces the number of trainable parameters by 88--99\% while maintaining or even improving transfer performance compared to state-of-the-art methods.

2509.26626 2026-02-25 cs.LG

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

Siddarth Venkatraman, Vineet Jain, Sarthak Mittal, Vedant Shah, Johan Obando-Ceron, Yoshua Bengio, Brian R. Bartoldson, Bhavya Kailkhura, Guillaume Lajoie, Glen Berseth, Nikolay Malkin, Moksh Jain

Comments 23 pages, 10 figures. Project page: https://rsa-llm.github.io/

详情
英文摘要

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA with Gemini 3 Flash attains performance near the top of the ARC-AGI-2 public leaderboard. RSA also enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further propose a novel aggregation-aware reinforcement learning approach that yields significant performance gains by training the model to combine solutions.

2509.26314 2026-02-25 cs.CL

Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in Its Latent Thoughts

Hanwen Du, Yuxin Dong, Xia Ning

详情
英文摘要

Large Language Models (LLMs) excel at problem solving by generating chain of thoughts in natural language, but such verbal thinking is computationally costly and prone to overthinking. A recent work instead proposes a latent thinking architecture, Huginn-3.5B, which represents intermediate reasoning steps as a sequence of latent representations. However, latent thoughts lack interpretability and are difficult to supervise, raising concerns about the correctness and reliability of the model's latent thinking processes. In this paper, we provide a systematic study of how Huginn-3.5B thinks in the latent space and how external supervision signals can improve its latent thinking processes. We show that latent thoughts leading to correct versus incorrect answers exhibit highly distinguishable patterns, and that a latent classifier can reliably predict answer correctness directly from latent thoughts. Leveraging these insights, we propose Latent Thinking Optimization (LTO), a probabilistic algorithm that employs the latent classifier as a Latent Reward Model (LRM) to optimize the latent thinking processes. Extensive experiments across diverse reasoning tasks demonstrate that LRM is highly effective in detecting incorrect latent thinking patterns, and LTO can significantly improve the latent thinking processes. Furthermore, we show that LRM can generalize across diverse domains, and LTO can be seamlessly applied to general LLMs to improve their thinking processes. In contrast to verbal thinking, our method demonstrates that reward modeling and scaling test-time thinking with supervision can be performed directly in the latent space, highlighting its potential as a general, efficient, and domain-agnostic approach to improving the thinking processes of LLMs.

2509.12261 2026-02-25 cs.SD eess.AS

An Adaptive CMSA for Solving the Longest Filled Common Subsequence Problem with an Application in Audio Querying

Marko Djukanovic, Christian Blum, Aleksandar Kartelj, Ana Nikolikj, Guenther Raidl

详情
英文摘要

This paper addresses the Longest Filled Common Subsequence (LFCS) problem, a challenging NP-hard problem with applications in bioinformatics, including gene mutation prediction and genomic data reconstruction. Existing approaches, including exact, metaheuristic, and approximation algorithms, have primarily been evaluated on small-sized instances, which offer limited insights into their scalability. In this work, we introduce a new benchmark dataset with significantly larger instances and demonstrate that existing datasets lack the discriminative power needed to meaningfully assess algorithm performance at scale. To solve large instances efficiently, we utilize an adaptive Construct, Merge, Solve, Adapt (CMSA) framework that iteratively generates promising subproblems via component-based construction and refines them using feedback from prior iterations. Subproblems are solved using an external black-box solver. Extensive experiments on both standard and newly introduced benchmarks show that the proposed adaptive CMSA achieves state-of-the-art performance, outperforming five leading methods. Notably, on 1,510 problem instances with known optimal solutions, our approach solves 1,486 of them -- achieving over 99.9% optimal solution quality and demonstrating exceptional scalability. We additionally propose a novel application of LFCS for song identification from degraded audio excerpts as an engineering contribution, using real-world energy-profile instances from popular music. Finally, we conducted an empirical explainability analysis to identify critical feature combinations influencing algorithm performance, i.e., the key problem features contributing to success or failure of the approaches across different instance types are revealed.

2508.21785 2026-02-25 cs.LG cs.CV

Learning Unified Representations from Heterogeneous Data for Robust Heart Rate Modeling

Zhengdong Huang, Zicheng Xie, Wentao Tian, Jingyu Liu, Lunhong Dong, Peng Yang

详情
英文摘要

Heart rate prediction is vital for personalized health monitoring and fitness, while it frequently faces a critical challenge in real-world deployment: data heterogeneity. We classify it in two key dimensions: source heterogeneity from fragmented device markets with varying feature sets, and user heterogeneity reflecting distinct physiological patterns across individuals and activities. Existing methods either discard device-specific information, or fail to model user-specific differences, limiting their real-world performance. To address this, we propose a framework that learns latent representations agnostic to both heterogeneity,enabling downstream predictors to work consistently under heterogeneous data patterns. Specifically, we introduce a random feature dropout strategy to handle source heterogeneity, making the model robust to various feature sets. To manage user heterogeneity, we employ a history-aware attention module to capture long-term physiological traits and use a contrastive learning objective to build a discriminative representation space. To reflect the heterogeneous nature of real-world data, we created a new benchmark dataset, PARROTAO. Evaluations on both PARROTAO and the public FitRec dataset show that our model significantly outperforms existing baselines by 17.5% and 10.4% in terms of test MSE, respectively. Furthermore, analysis of the learned representations demonstrates their strong discriminative power,and two downstream application tasks confirm the practical value of our model.

2508.18632 2026-02-25 cs.CV

Decouple, Reorganize, and Fuse: A Multimodal Framework for Cancer Survival Prediction

Huayi Wang, Haochao Ying, Yuyang Xu, Qibo Qiu, Cheng Zhang, Danny Z. Chen, Ying Sun, Jian Wu

Comments 13 pages

详情
英文摘要

Cancer survival analysis commonly integrates information across diverse medical modalities to make survival-time predictions. Existing methods primarily focus on extracting different decoupled features of modalities and performing fusion operations such as concatenation, attention, and \revm{Mixture-of-Experts (MoE)-based fusion. However, these methods still face two key challenges: i) Fixed fusion schemes (concatenation and attention) can lead to model over-reliance on predefined feature combinations, limiting the dynamic fusion of decoupled features; ii) in MoE-based fusion methods, each expert network handles separate decoupled features, which limits information interaction among the decoupled features. To address these challenges, we propose a novel Decoupling-Reorganization-Fusion framework (DeReF), which devises a random feature reorganization strategy between modalities decoupling and dynamic MoE fusion modules.Its advantages are: i) it increases the diversity of feature combinations and granularity, enhancing the generalization ability of the subsequent expert networks; ii) it overcomes the problem of information closure and helps expert networks better capture information among decoupled features. Additionally, we incorporate a regional cross-attention network within the modality decoupling module to improve the representation quality of decoupled features. Extensive experimental results on our in-house Liver Cancer (LC) and three widely used TCGA public datasets confirm the effectiveness of our proposed method. Codes are available at https://github.com/ZJUMAI/DeReF.

2508.16815 2026-02-25 cs.LG

Uncertainty Propagation Networks for Neural Ordinary Differential Equations

Hadi Jahanshahi, Zheng H. Zhu

详情
Journal ref
Neurocomputing, 2026, 133134
英文摘要

This paper introduces Uncertainty Propagation Network (UPN), a novel family of neural differential equations that naturally incorporate uncertainty quantification into continuous-time modeling. Unlike existing neural ODEs that predict only state trajectories, UPN simultaneously model both state evolution and its associated uncertainty by parameterizing coupled differential equations for mean and covariance dynamics. The architecture efficiently propagates uncertainty through nonlinear dynamics without discretization artifacts by solving coupled ODEs for state and covariance evolution while enabling state-dependent, learnable process noise. The continuous-depth formulation adapts its evaluation strategy to each input's complexity, provides principled uncertainty quantification, and handles irregularly-sampled observations naturally. Experimental results demonstrate UPN's effectiveness across multiple domains: continuous normalizing flows (CNFs) with uncertainty quantification, time-series forecasting with well-calibrated confidence intervals, and robust trajectory prediction in both stable and chaotic dynamical systems.

2508.15949 2026-02-25 cs.LG math.OC

An Efficient Hybridization of Graph Representation Learning and Metaheuristics for the Constrained Incremental Graph Drawing Problem

Bruna C. B. Charytitsch, Mariá C. V. Nascimento

Comments The paper has been accepted for publication in the European Journal of Operational Research. Supplementary material will be available on the journal website or upon request

详情
Journal ref
European Journal of Operational Research Volume 330, Issue 2, 16 April 2026, Pages 381-397
英文摘要

Hybridizing machine learning techniques with metaheuristics has attracted significant attention in recent years. Many attempts employ supervised or reinforcement learning to support the decision-making of heuristic methods. However, in some cases, these techniques are deemed too time-consuming and not competitive with hand-crafted heuristics. This paper proposes a hybridization between metaheuristics and a less expensive learning strategy to extract the latent structure of graphs, known as Graph Representation Learning (GRL). For such, we approach the Constrained Incremental Graph Drawing Problem (C-IGDP), a hierarchical graph visualization problem. There is limited literature on methods for this problem, for which Greedy Randomized Search Procedures (GRASP) heuristics have shown promising results. In line with this, this paper investigates the gains of incorporating GRL into the construction phase of GRASP, which we refer to as Graph Learning GRASP (GL-GRASP). In computational experiments, we first analyze the results achieved considering different node embedding techniques, where deep learning-based strategies stood out. The evaluation considered the primal integral measure that assesses the quality of the solutions according to the required time for such. According to this measure, the best GL-GRASP heuristics demonstrated superior performance than state-of-the-art literature GRASP heuristics for the problem. A scalability test on newly generated denser instances under a fixed time limit further confirmed the robustness of the GL-GRASP heuristics.

2508.10453 2026-02-25 cs.CV

Trajectory-aware Shifted State Space Models for Online Video Super-Resolution

Qiang Zhu, Xiandong Meng, Yuxian Jiang, Fan Zhang, David Bull, Shuyuan Zhu, Bing Zeng, Ronggang Wang

Comments ICLR2026

详情
英文摘要

Online video super-resolution (VSR) is an important technique for many real-world video processing applications, which aims to restore the current high-resolution video frame based on temporally previous frames. Most of the existing online VSR methods solely employ one neighboring previous frame to achieve temporal alignment, which limits long-range temporal modeling of videos. Recently, state space models (SSMs) have been proposed with linear computational complexity and a global receptive field, which significantly improve computational efficiency and performance. In this context, this paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba), leveraging both long-term trajectory modeling and low-complexity Mamba to achieve efficient spatio-temporal information aggregation. Specifically, TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames. Then, a Trajectory-aware Shifted Mamba Aggregation (TSMA) module consisting of proposed shifted SSMs blocks is employed to aggregate the selected tokens. The shifted SSMs blocks are designed based on Hilbert scannings and corresponding shift operations to compensate for scanning losses and strengthen the spatial continuity of Mamba. Additionally, we propose a trajectory-aware loss function to supervise the trajectory generation, ensuring the accuracy of token selection when training our model. Extensive experiments on three widely used VSR test datasets demonstrate that compared with six online VSR benchmark models, our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7% complexity reduction (in MACs).

2508.06878 2026-02-25 cs.CV cs.AI

Seeing Through the Noise: Improving Infrared Small Target Detection and Segmentation from Noise Suppression Perspective

Maoxun Yuan, Duanni Meng, Ziteng Xi, Tianyi Zhao, Shiji Zhao, Yimian Dai, Xingxing Wei

详情
英文摘要

Infrared small target detection and segmentation (IRSTDS) is a critical yet challenging task in defense and civilian applications, owing to the dim, shapeless appearance of targets and severe background clutter. Recent CNN-based methods have achieved promising target perception results, but they only focus on enhancing feature representation to offset the impact of noise, which results in the increased false alarm problem. In this paper, through analyzing the problem from the frequency domain, we pioneer in improving performance from noise suppression perspective and propose a novel noise-suppression feature pyramid network (NS-FPN), which integrates a low-frequency guided feature purification (LFP) module and a spiral-aware feature sampling (SFS) module into the original FPN structure. The LFP module suppresses the noise features by purifying high-frequency components to achieve feature enhancement devoid of noise interference, while the SFS module further adopts spiral sampling to fuse target-relevant features in feature fusion process. Our NS-FPN is designed to be lightweight yet effective and can be easily plugged into existing IRSTDS frameworks. Extensive experiments on the IRSTD-1k and NUAA-SIRST datasets demonstrate that our method significantly reduces false alarms and achieves superior performance on IRSTDS task.

2508.03616 2026-02-25 cs.AI

Hidden Dynamics of Massive Activations in Transformer Training

Jorge Gallego-Feliciano, S. Aaron McClendon, Juan Morinelli, Stavros Zervoudakis, Antonios Saravanos

详情
英文摘要

We present the first comprehensive analysis of massive activation development throughout transformer training, using the Pythia model family as our testbed, and release our full dataset publicly to support further research. Through systematic analysis of various model sizes across multiple training checkpoints, we demonstrate that massive activation emergence follows highly predictable mathematical patterns that can be accurately modeled using an exponentially-modulated logarithmic function with five key parameters. Additionally, We develop a machine learning framework to predict these mathematical parameters from architectural specifications alone, achieving high accuracy for steady-state behavior and moderate accuracy for emergence timing and magnitude. These findings enable architects to predict and potentially control key aspects of massive activation emergence through design choices, with significant implications for model stability, training cycle length, interpretability, and optimization. Our findings demonstrate that the emergence of massive activations is governed by model design and can be anticipated, and potentially controlled, before training begins. Code is available at https://github.com/Aimpoint-Digital/massive-activations-fork

2507.15444 2026-02-25 cs.RO cs.CV

Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe

Leonard Bauersfeld, Davide Scaramuzza

Comments 19 pages

详情
Journal ref
in IEEE Transactions on Robotics, vol. 42, pp. 1-19, 2026
英文摘要

Autonomous quadrotor flight in confined spaces such as pipes and tunnels presents significant challenges due to unsteady, self-induced aerodynamic disturbances. Very recent advances have enabled flight in such conditions, but they either rely on constant motion through the pipe to mitigate airflow recirculation effects or suffer from limited stability during hovering. In this work, we present the first closed-loop control system for quadrotors for hovering in narrow pipes that leverages real-time flow field measurements. We develop a low-latency, event-based smoke velocimetry method that estimates local airflow at high temporal resolution. This flow information is used by a disturbance estimator based on a recurrent convolutional neural network, which infers force and torque disturbances in real time. The estimated disturbances are integrated into a learning-based controller trained via reinforcement learning. The flow-feedback control proves particularly effective during lateral translation maneuvers in the pipe cross-section. There, the real-time disturbance information enables the controller to effectively counteract transient aerodynamic effects, thereby preventing collisions with the pipe wall. To the best of our knowledge, this work represents the first demonstration of an aerial robot with closed-loop control informed by real-time flow field measurements. This opens new directions for research on flight in aerodynamically complex environments. In addition, our work also sheds light on the characteristic flow structures that emerge during flight in narrow, circular pipes, providing new insights at the intersection of robotics and fluid dynamics.

2507.04448 2026-02-25 cs.LG cond-mat.dis-nn stat.ML

Transfer Learning in Infinite Width Feature Learning Networks

Clarissa Lauditi, Blake Bordelon, Cengiz Pehlevan

详情
英文摘要

We develop a theory of transfer learning in infinitely wide neural networks under gradient flow that quantifies when pretraining on a source task improves generalization on a target task. We analyze both (i) fine-tuning, when the downstream predictor is trained on top of source-induced features and (ii) a jointly rich setting, where both pretraining and downstream tasks can operate in a feature learning regime, but the downstream model is initialized with the features obtained after pre-training. In this setup, the summary statistics of randomly initialized networks after a rich pre-training are adaptive kernels which depend on both source data and labels. For (i), we analyze the performance of a readout for different pretraining data regimes. For (ii), the summary statistics after learning the target task are still adaptive kernels with features from both source and target tasks. We test our theory on linear and polynomial regression tasks as well as real datasets. Our theory allows interpretable conclusions on performance, which depend on the amount of data on both tasks, the alignment between tasks, and the feature learning strength.

2507.04002 2026-02-25 cs.CV cs.RO eess.IV

NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models

Siyu Li, Fei Teng, Yihong Cao, Kailun Yang, Zhiyong Li, Yaonan Wang

Comments Accepted to IEEE Transactions on Image Processing (TIP). The source code will be made publicly available at https://github.com/lynn-yu/NRSeg

详情
英文摘要

Birds' Eye View (BEV) semantic segmentation is an indispensable perception task in end-to-end autonomous driving systems. Unsupervised and semi-supervised learning for BEV tasks, as pivotal for real-world applications, underperform due to the homogeneous distribution of the labeled data. In this work, we explore the potential of synthetic data from driving world models to enhance the diversity of labeled data for robustifying BEV segmentation. Yet, our preliminary findings reveal that generation noise in synthetic data compromises efficient BEV model learning. To fully harness the potential of synthetic data from world models, this paper proposes NRSeg, a noise-resilient learning framework for BEV semantic segmentation. Specifically, a Perspective-Geometry Consistency Metric (PGCM) is proposed to quantitatively evaluate the guidance capability of generated data for model learning. This metric originates from the alignment measure between the perspective road mask of generated data and the mask projected from the BEV labels. Moreover, a Bi-Distribution Parallel Prediction (BiDPP) is designed to enhance the inherent robustness of the model, where the learning process is constrained through parallel prediction of multinomial and Dirichlet distributions. The former efficiently predicts semantic probabilities, whereas the latter adopts evidential deep learning to realize uncertainty quantification. Furthermore, a Hierarchical Local Semantic Exclusion (HLSE) module is designed to address the non-mutual exclusivity inherent in BEV semantic segmentation tasks. Experimental results demonstrate that NRSeg achieves state-of-the-art performance, yielding the highest improvements in mIoU of 13.8% and 11.4% in unsupervised and semi-supervised BEV segmentation tasks, respectively. The source code will be made publicly available at https://github.com/lynn-yu/NRSeg.

2507.03854 2026-02-25 cs.LG cs.SD cs.SY eess.AS eess.SY nlin.AO stat.ML

Latent FxLMS: Accelerating Active Noise Control with Neural Adaptive Filters

Kanad Sarkar, Austin Lu, Manan Mittal, Yongjie Zhuang, Ryan Corey, Andrew Singer

Comments 8 pages, Submitted at Forum Acousticum Euronoise 2025

详情
Journal ref
10.61782/fa.2025.0565
英文摘要

Filtered-X LMS (FxLMS) is commonly used for active noise control (ANC), wherein the soundfield is minimized at a desired location. Given prior knowledge of the spatial region of the noise or control sources, we could improve FxLMS by adapting along the low-dimensional manifold of possible adaptive filter weights. We train an auto-encoder on the filter coefficients of the steady-state adaptive filter for each primary source location sampled from a given spatial region and constrain the weights of the adaptive filter to be the output of the decoder for a given state of latent variables. Then, we perform updates in the latent space and use the decoder to generate the cancellation filter. We evaluate how various neural network constraints and normalization techniques impact the convergence speed and steady-state mean squared error. Under certain conditions, our Latent FxLMS model converges in fewer steps with comparable steady-state error to the standard FxLMS.

2506.18777 2026-02-25 cs.AI cs.CL cs.LG

Programming by Backprop: An Instruction is Worth 100 Examples When Finetuning LLMs

Jonathan Cook, Silvia Sapora, Arash Ahmadian, Akbir Khan, Tim Rocktaschel, Jakob Foerster, Laura Ruis

详情
英文摘要

Large language models (LLMs) are typically trained to acquire behaviours from demonstrations or experience, yet much of their training data is declarative: instructions, rules, and descriptions that specify behaviours without showing how to execute them. We introduce Programming by Backprop (PBB): a training regime that enables LLMs to acquire procedural knowledge (i.e., reusable behaviours) from declarative instructions encountered during training. With PBB, instructions in training data provide an opportunity to `program' specific behaviours into model weights. The core principle underpinning PBB is the separation of learning how instructions map to behaviour from internalising new instructions. We devise two distinct PBB curricula that leverage this principle. Through controlled experiments across two domains (algorithmic execution from Python source code and text generation from context-free grammars), we demonstrate the benefit of these curricula over training on a homogeneous data mixture. Crucially, PBB is highly sample efficient, with a single instruction substituting for up to 100 execution examples. Though execution of instructions in training data remains less reliable than when instructions are given in-context, our results demonstrate that procedural knowledge can be noisily `programmed' into LLMs through PBB, with important implications for data curation and safety.

2506.10167 2026-02-25 cs.LG cs.SY eess.SY

Wasserstein Barycenter Soft Actor-Critic

Zahra Shahrooei, Ali Baheri

详情
英文摘要

Deep off-policy actor-critic algorithms have emerged as the leading framework for reinforcement learning in continuous control domains. However, most of these algorithms suffer from poor sample efficiency, especially in environments with sparse rewards. In this paper, we take a step towards addressing this issue by providing a principled directed exploration strategy. We propose Wasserstein Barycenter Soft Actor-Critic (WBSAC) algorithm, which benefits from a pessimistic actor for temporal difference learning and an optimistic actor to promote exploration. This is achieved by using the Wasserstein barycenter of the pessimistic and optimistic policies as the exploration policy and adjusting the degree of exploration throughout the learning process. We compare WBSAC with state-of-the-art off-policy actor-critic algorithms and show that WBSAC is more sample-efficient on MuJoCo continuous control tasks.

2506.04867 2026-02-25 cs.AI cs.HC cs.LG cs.RO

Sensory-Motor Control with Large Language Models via Iterative Policy Refinement

Jônata Tyska Carvalho, Stefano Nolfi

Comments Final version of the article accepted for publication on Scientific Reports. 29 pages (13 pages are from appendix), 8 figures, 2 tables, code for experiments replication and supplementary material provided at https://github.com/jtyska/llm-robotics-article/

详情
英文摘要

We propose a method that enables large language models (LLMs) to control embodied agents through the generation of control policies that directly map continuous observation vectors to continuous action vectors. At the outset, the LLMs generate a control strategy based on a textual description of the agent, its environment, and the intended goal. This strategy is then iteratively refined through a learning process in which the LLMs are repeatedly prompted to improve the current strategy, using performance feedback and sensory-motor data collected during its evaluation. The method is validated on classic control tasks from the Gymnasium library and the inverted pendulum task from the MuJoCo library. The approach proves effective with relatively compact models such as GPT-oss:120b and Qwen2.5:72b. In most cases, it successfully identifies optimal or near-optimal solutions by integrating symbolic knowledge derived through reasoning with sub-symbolic sensory-motor data gathered as the agent interacts with its environment.

2505.14685 2026-02-25 cs.CL

Language Models use Lookbacks to Track Beliefs

Nikhil Prakash, Natalie Shapira, Arnab Sen Sharma, Christoph Riedl, Yonatan Belinkov, Tamar Rott Shaham, David Bau, Atticus Geiger

Comments 38 pages, 50 figures. Code and data at https://belief.baulab.info/

详情
英文摘要

How do language models (LMs) represent characters' beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilities of LMs. We analyze LMs' ability to reason about characters' beliefs using causal mediation and abstraction. We construct a dataset, CausalToM, consisting of simple stories where two characters independently change the state of two objects, potentially unaware of each other's actions. Our investigation uncovers a pervasive algorithmic pattern that we call a lookback mechanism, which enables the LM to recall important information when it becomes necessary. The LM binds each character-object-state triple together by co-locating their reference information, represented as Ordering IDs (OIs), in low-rank subspaces of the state token's residual stream. When asked about a character's beliefs regarding the state of an object, the binding lookback retrieves the correct state OI and then the answer lookback retrieves the corresponding state token. When we introduce text specifying that one character is (not) visible to the other, we find that the LM first generates a visibility ID encoding the relation between the observing and the observed character OIs. In a visibility lookback, this ID is used to retrieve information about the observed character and update the observing character's beliefs. Our work provides insights into belief tracking mechanisms, taking a step toward reverse-engineering ToM reasoning in LMs.

2505.11602 2026-02-25 cs.LG math.DS math.OC stat.ML

Regularity and Stability Properties of Selective SSMs with Discontinuous Gating

Nikola Zubić, Davide Scaramuzza

Comments 26 pages, 6 theorems, 2 figures, 1 table

详情
英文摘要

Deep selective State-Space Models (SSMs), whose state-space parameters are modulated online by a selection signal, offer significant expressive power but pose challenges for stability analysis, especially under discontinuous gating. We study continuous-time selective SSMs through the lenses of passivity and Input-to-State Stability (ISS), explicitly distinguishing the selection schedule $x(\cdot)$ from the driving (port) input $u(\cdot)$. First, we show that state-strict dissipativity ($β>0$) together with quadratic bounds on a storage functional implies exponential decay of homogeneous trajectories ($u\equiv 0$), yielding exponential forgetting. Second, by freezing the selection ($x(t)\equiv 0$) we obtain a passive LTV input-output subsystem and prove that its minimal available storage is necessarily quadratic, $V_{a,0}(t,h)=\tfrac{1}{2}h^H Q_0(t)h,$ with $Q_0 \in \mathrm{AUC}_{\mathrm{loc}}$, accommodating discontinuities induced by gating. Third, under the strong hypothesis that a single quadratic storage certifies passivity uniformly over all admissible selection schedules, we derive a parametric LMI and universal kernel constraints on gating, formalizing an "irreversible forgetting" structure. Finally, we give sufficient conditions for global ISS with respect to the port input $u(\cdot)$, uniformly over admissible selection schedules, and we validate the main predictions in targeted simulation studies.

2505.03356 2026-02-25 cs.RO

Effective Reinforcement Learning Control using Conservative Soft Actor-Critic

Zhiwei Shang, Xinyi Yuan, Wenjun Huang, Yunduan Cui, Di Chen, Meixin Zhu

Comments 8 pages, 9 figures

详情
英文摘要

Reinforcement Learning (RL) has shown great potential in complex control tasks, particularly when combined with deep neural networks within the Actor-Critic (AC) framework. However, in practical applications, balancing exploration, learning stability, and sample efficiency remains a significant challenge. Traditional methods such as Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) address these issues by incorporating entropy or relative entropy regularization, but often face problems of instability and low sample efficiency. In this paper, we propose the Conservative Soft Actor-Critic (CSAC) algorithm, which seamlessly integrates entropy and relative entropy regularization within the AC framework. CSAC improves exploration through entropy regularization while avoiding overly aggressive policy updates with the use of relative entropy regularization. Evaluations on benchmark tasks and real-world robotic simulations demonstrate that CSAC offers significant improvements in stability and efficiency over existing methods. These findings suggest that CSAC provides strong robustness and application potential in control tasks under dynamic environments.

2503.12434 2026-02-25 cs.AI

A Survey on the Optimization of Large Language Model-based Agents

Shangheng Du, Jiabao Zhao, Jinxin Shi, Zhentao Xie, Xin Jiang, Yanhong Bai, Liang He

Comments Published in ACM Computing Surveys, Vol. 58, No. 9, Article 223, July 2026

详情
Journal ref
ACM Computing Surveys 58(9), Article 223, July 2026
英文摘要

With the rapid development of Large Language Models (LLMs), LLM-based agents have been widely adopted in various fields, becoming essential for autonomous decision-making and interactive tasks. However, current work typically relies on prompt design or fine-tuning strategies applied to vanilla LLMs, which often leads to limited effectiveness or suboptimal performance in complex agent-related environments. Although LLM optimization techniques can improve model performance across many general tasks, they lack specialized optimization towards critical agent functionalities such as long-term planning, dynamic environmental interaction, and complex decision-making. Although numerous recent studies have explored various strategies to optimize LLM-based agents for complex agent tasks, a systematic review summarizing and comparing these methods from a holistic perspective is still lacking. In this survey, we provide a comprehensive review of LLM-based agent optimization approaches, categorizing them into parameter-driven and parameter-free methods. We first focus on parameter-driven optimization, covering fine-tuning-based optimization, reinforcement learning-based optimization, and hybrid strategies, analyzing key aspects such as trajectory data construction, fine-tuning techniques, reward function design, and optimization algorithms. Additionally, we briefly discuss parameter-free strategies that optimize agent behavior through prompt engineering and external knowledge retrieval. Finally, we summarize the datasets and benchmarks used for evaluation and tuning, review key applications of LLM-based agents, and discuss major challenges and promising future directions. Our repository for related references is available at https://github.com/YoungDubbyDu/LLM-Agent-Optimization.

2502.12927 2026-02-25 cs.CL

SEFL: A Framework for Generating Synthetic Educational Assignment Feedback with LLM Agents

Mike Zhang, Amalie Pernille Dilling, Léon Gondelman, Niels Erik Ruan Lyngdorf, Euan D. Lindsay, Johannes Bjerva

Comments LREC 2026

详情
英文摘要

Providing high-quality feedback on student assignments is crucial for student success, but it is heavily limited by time and budgetary constraints. In this work, we introduce Synthetic Educational Feedback Loops (SEFL), a synthetic data framework designed to generate data that resembles immediate, on-demand feedback at scale without relying on extensive, real-world student assignments and teacher feedback. To obtain this type of data, two large language models (LLMs) operate in a teacher-student role to simulate assignment completion and formative feedback, generating 19.8K synthetic pairs of student work and corresponding critiques and actionable improvements from a teacher. With this data, we fine-tune smaller, more computationally efficient LLMs on these synthetic pairs, enabling them to replicate key features of high-quality, goal-oriented feedback. Through comprehensive evaluations with three LLM judges and three human experts, across a subset of 900 outputs, we demonstrate that SEFL-tuned models outperform both their untuned counterparts and an existing baseline in terms of feedback quality. The potential for societal impact is reinforced by extensive qualitative comments and ratings from human stakeholders -- both students and higher education instructors. SEFL has the potential to transform feedback processes for higher education and beyond.

2501.17256 2026-02-25 cs.LG

Increasing Information for Model Predictive Control with Semi-Markov Decision Processes

Rémy Hosseinkhan-Boucher, Onofrio Semeraro, Lionel Mathelin

详情
Journal ref
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, p. 1400--1414, volume 242, publisher: Proceedings of Machine Learning Research, 2024
英文摘要

Recent works in Learning-Based Model Predictive Control of dynamical systems show impressive sample complexity performances using criteria from Information Theory to accelerate the learning procedure. However, the sequential exploration opportunities are limited by the system local state, restraining the amount of information of the observations from the current exploration trajectory. This article resolves this limitation by introducing temporal abstraction through the framework of Semi-Markov Decision Processes. The framework increases the total information of the gathered data for a fixed sampling budget, thus reducing the sample complexity.

2501.17115 2026-02-25 cs.LG

Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning

Rémy Hosseinkhan-Boucher, Onofrio Semeraro, Lionel Mathelin

详情
英文摘要

The generalisation and robustness properties of policies learnt through Maximum-Entropy Reinforcement Learning are investigated on chaotic dynamical systems with Gaussian noise on the observable. First, the robustness under noise contamination of the agent's observation of entropy regularised policies is observed. Second, notions of statistical learning theory, such as complexity measures on the learnt model, are borrowed to explain and predict the phenomenon. Results show the existence of a relationship between entropy-regularised policy optimisation and robustness to noise, which can be described by the chosen complexity measures.

2501.16613 2026-02-25 cs.LG cs.AI

Safe Reinforcement Learning for Real-World Engine Control

Julian Bedei, Lucas Koch, Kevin Badalian, Alexander Winkler, Patrick Schaber, Jakob Andert

详情
英文摘要

This work introduces a toolchain for applying Reinforcement Learning (RL), specifically the Deep Deterministic Policy Gradient (DDPG) algorithm, in safety-critical real-world environments. As an exemplary application, transient load control is demonstrated on a single-cylinder internal combustion engine testbench in Homogeneous Charge Compression Ignition (HCCI) mode, that offers high thermal efficiency and low emissions. However, HCCI poses challenges for traditional control methods due to its nonlinear, autoregressive, and stochastic nature. RL provides a viable solution, however, safety concerns, such as excessive pressure rise rates, must be addressed when applying to HCCI. A single unsuitable control input can severely damage the engine or cause misfiring and shut down. Additionally, operating limits are not known a priori and must be determined experimentally. To mitigate these risks, real-time safety monitoring based on the k-nearest neighbor algorithm is implemented, enabling safe interaction with the testbench. The feasibility of this approach is demonstrated as the RL agent learns a control policy through interaction with the testbench. A root mean square error of 0.1374 bar is achieved for the indicated mean effective pressure, comparable to neural network-based controllers from the literature. The toolchain's flexibility is further demonstrated by adapting the agent's policy to increase ethanol energy shares, promoting renewable fuel use while maintaining safety. This RL approach addresses the longstanding challenge of applying RL to safety-critical real-world environments. The developed toolchain, with its adaptability and safety mechanisms, paves the way for future applicability of RL in engine testbenches and other safety-critical settings.

2501.08219 2026-02-25 cs.LG

Characterizing LLM Inference Energy-Performance Tradeoffs across Workloads and GPU Scaling

Paul Joe Maliakel, Shashikant Ilager, Ivona Brandic

详情
英文摘要

LLM inference exhibits substantial variability across queries and execution phases, yet inference configurations are often applied uniformly. We present a measurement-driven characterization of workload heterogeneity and energy-performance behavior of LLM inference under GPU dynamic voltage and frequency scaling (DVFS). We evaluate five decoder-only LLMs (1B-32B parameters) across four NLP benchmarks using a controlled offline setup. We show that lightweight semantic features predict inference difficulty better than input length, with 44.5% of queries achieving comparable quality across model sizes. At the hardware level, the decode phase dominates inference time (77-91%) and is largely insensitive to GPU frequency. Consequently, reducing GPU frequency from 2842 MHz to 180 MHz achieves an average of 42% energy savings with only a 1-6% latency increase. We further provide a use case with an upper-bound analysis of the potential benefits of combining workload-aware model selection with phase-aware DVFS, motivating future energy-efficient LLM inference systems.

2408.07016 2026-02-25 cs.LG cs.AI stat.ML

Rethinking Disentanglement under Dependent Factors of Variation

Antonio Almudévar, Alfonso Ortega

详情
英文摘要

Representation learning is an approach that allows to discover and extract the factors of variation from the data. Intuitively, a representation is said to be disentangled if it separates the different factors of variation in a way that is understandable to humans. Definitions of disentanglement and metrics to measure it usually assume that the factors of variation are independent of each other. However, this is generally false in the real world, which limits the use of these definitions and metrics to very specific and unrealistic scenarios. In this paper we give a definition of disentanglement based on information theory that is also valid when the factors of variation are not independent. Furthermore, we relate this definition to the Information Bottleneck Method. Finally, we propose a method to measure the degree of disentanglement from the given definition that works when the factors of variation are not independent. We show through different experiments that the method proposed in this paper correctly measures disentanglement with non-independent factors of variation, while other methods fail in this scenario.

2309.13411 2026-02-25 cs.LG cs.AI cs.CV

Towards Attributions of Input Variables in a Coalition

Xinhao Zheng, Huiqi Deng, Quanshi Zhang

Comments Accepted to the 2025 International Conference on Machine Learning (ICML 2025)

详情
英文摘要

This paper focuses on the fundamental challenge of partitioning input variables in attribution methods for Explainable AI, particularly in Shapley value-based approaches. Previous methods always compute attributions given a predefined partition but lack theoretical guidance on how to form meaningful variable partitions. We identify that attribution conflicts arise when the attribution of a coalition differs from the sum of its individual variables' attributions. To address this, we analyze the numerical effects of AND-OR interactions in AI models and extend the Shapley value to a new attribution metric for variable coalitions. Our theoretical findings reveal that specific interactions cause attribution conflicts, and we propose three metrics to evaluate coalition faithfulness. Experiments on synthetic data, NLP, image classification, and the game of Go validate our approach, demonstrating consistency with human intuition and practical applicability.

2208.08054 2026-02-25 cs.RO

A Novel Semi-Coupled Hierarchical Motion Planning Framework for Cooperative Transportation of Multiple Mobile Manipulators

Heng Zhang, Haoyi Song, Wenhang Liu, Xinjun Sheng, Zhenhua Xiong, Xiangyang Zhu

Comments 21 pages, 9 figures

详情
Journal ref
Robotica 43 (2025) 4302-4324
英文摘要

Multiple mobile manipulators show superiority in the tasks requiring mobility and dexterity compared with a single robot, especially when manipulating/transporting bulky objects. However, closed-chain of the system, redundancy of each mobile manipulator and obstacles in the environment bring challenges to the motion planning problem. In this paper, we propose a novel semi-coupled hierarchical framework (SCHF), which decomposes the problem into two semi-coupled sub-problems.To be specific, the centralized layer plans the object's motion first and then the decentralized layer independently explores the redundancy of each robot in real-time. A notable feature is that the lower bound of the redundancy constraint metric is ensured besides the closed-chain and obstacle-avoidance constraints in the centralized layer, which ensures the object's motion can be executed by each robot in the decentralized layer. Simulated results show that the success rate and time cost of SCHF outperforms the fully centralized planner and fully decoupled hierarchical planner significantly. In addition, cluttered real-world experiments also show the feasibility of the SCHF in the transportation tasks. A video clip in various scenarios can be found at https://youtu.be/Y8ZrnspIuBg.