arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2083
2601.21975 2026-05-14 cs.AI cs.ET

Mind the Gap: How Elicitation Protocols Shape the Stated-Revealed Preference Gap in Language Models

Pranav Mahajan, Ihor Kendiukhov, Syed Hussain, Lydia Nottingham

AI总结 该研究探讨了语言模型中陈述偏好与揭示偏好之间的差距(SvR gap),并分析了不同偏好获取协议对此差距的影响。研究发现,允许在陈述偏好过程中表达中立或弃权可以提升偏好相关性,但若在揭示偏好中也允许弃权,则可能导致相关性显著下降。研究强调,偏好获取方法需考虑不确定偏好,以更准确地评估模型的真实价值倾向。

Comments Accepted to ACL 2026 Eval Eval Workshop and 3rd Technical AI Safety Conference (TAIS 2026)

详情
英文摘要

Recent work identifies a stated-revealed (SvR) preference gap in language models (LMs): a mismatch between the values models endorse and the choices they make in context. Existing evaluations rely heavily on binary forced-choice prompting, which entangles genuine preferences with artifacts of the elicitation protocol. We systematically study how elicitation protocols affect SvR correlation across 24 LMs. Allowing neutrality and abstention during stated preference elicitation allows us to exclude weak signals, substantially improving Spearman's rank correlation ($ρ$) between volunteered stated preferences and forced-choice revealed preferences. However, further allowing abstention in revealed preferences drives $ρ$ to near-zero or negative values due to high neutrality rates. Finally, we find that system prompt steering using stated preferences during revealed preference elicitation does not reliably improve SvR correlation on AIRiskDilemmas. Together, our results show that SvR correlation is highly protocol-dependent and that preference elicitation requires methods that account for indeterminate preferences.

2601.21366 2026-05-14 cs.LG math.OC

Perceptrons and localization of attention's mean-field landscape

Antonio Álvarez-López, Borjan Geshkovski, Domènec Ruiz-Balet

AI总结 本文研究了Transformer模型中感知机模块在注意力机制均场景观中的作用,将前向传播过程建模为单位球面上的相互作用粒子系统。通过分析权重设置下的梯度流和无限上下文长度的均场极限,发现临界点通常具有原子性和在球面子集上的局部化特性,揭示了注意力机制在高维空间中的结构特征。

详情
英文摘要

The forward pass of a Transformer can be seen as an interacting particle system on the unit sphere: time plays the role of layers, particles that of token embeddings, and the unit sphere idealizes layer normalization. In some weight settings the system can even be seen as a gradient flow for an explicit energy, and one can make sense of the infinite context length (mean-field) limit thanks to Wasserstein gradient flows. In this paper we study the effect of the perceptron block in this setting, and show that critical points are generically atomic and localized on subsets of the sphere.

2601.21033 2026-05-14 cs.LG

Predict-Project-Renoise: Sampling Diffusion Models under Hard Constraints

Omer Rochman-Sharabi, Gilles Louppe

AI总结 扩散模型难以满足严格的约束条件,而物理科学中的许多应用则需要精确满足守恒定律、边界条件和观测一致性。本文提出了一种名为Predict-Project-Renoise(PPR)的算法,通过迭代地利用去噪器进行投影并结合前向扩散核重新引入噪声,从而在预训练扩散模型中实现对硬约束的采样。该方法在多个实验中表现出色,能够在保持分布保真度的同时显著降低约束违反程度,是现有方法所无法实现的。

Comments Code coming soon

详情
英文摘要

Diffusion models cannot enforce hard constraints, yet applications in the physical sciences demand exact satisfaction of conservation laws, boundary conditions, and observational consistency. In this work, we identify a corrector kernel whose unique stationary distribution is the constrained marginal at each noise level, and approximate it by iteratively projecting through the denoiser and renoising via the forward kernel. The resulting Predict-Project-Renoise (PPR) algorithm enables sampling from pretrained diffusion models under hard constraints. Its three components are each necessary: projecting through the denoiser keeps samples close to the data manifold, while renoising and iterating drive samples toward the constrained marginal. On 2D distributions, the Kuramoto-Sivashinsky equation, and global weather forecasting with a $10^8$-dimensional atmospheric model, PPR simultaneously achieves low constraint violations and high distributional fidelity, a combination that existing methods fail to deliver.

2601.20239 2026-05-14 cs.RO

TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance

Zhemeng Zhang, Jiahua Ma, Xincheng Yang, Xin Wen, Yuzhi Zhang, Boyan Li, Yiran Qin, Jin Liu, Can Zhao, Li Kang, Haoqin Hong, Zhenfei Yin, Philip Torr, Hao Su, Ruimao Zhang, Daolin Ma

AI总结 本文提出了一种名为TouchGuide的新方法,通过触觉引导在推理阶段对视觉运动策略进行引导,以提升机器人对精细和高接触任务的操控能力。该方法结合预训练的视觉运动策略与任务特定的接触物理模型(CPM),在低维动作空间中融合视觉与触觉信息,从而生成符合物理接触约束的精细动作。此外,研究还引入了TacUMI数据采集系统,以高效、低成本地获取可靠的触觉数据,实验表明TouchGuide在多个复杂任务中显著优于现有方法。

详情
英文摘要

Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.

2601.18608 2026-05-14 cs.AI cs.LG

PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression

Fabian Fumagalli, R. Teal Witter, Christopher Musco

AI总结 本文提出了一种名为 PolySHAP 的新方法,通过引入高阶多项式回归扩展了 KernelSHAP 算法,以更准确地捕捉特征之间的非线性交互作用,从而提升对 Shapley 值的估计效果。研究证明了 PolySHAP 在多个基准数据集上具有更好的实证表现,并且其估计结果具有一致性。此外,该方法还揭示了配对采样(antithetic sampling)与二阶 PolySHAP 之间的理论联系,为这一广泛使用的改进方法提供了首个坚实的理论依据。

Comments Published at ICLR 2026: https://openreview.net/forum?id=M19J8UGguq

详情
英文摘要

Shapley values have emerged as a central game-theoretic tool in explainable AI (XAI). However, computing Shapley values exactly requires $2^d$ game evaluations for a model with $d$ features. Lundberg and Lee's KernelSHAP algorithm has emerged as a leading method for avoiding this exponential cost. KernelSHAP approximates Shapley values by approximating the game as a linear function, which is fit using a small number of game evaluations for random feature subsets. In this work, we extend KernelSHAP by approximating the game via higher degree polynomials, which capture non-linear interactions between features. Our resulting PolySHAP method yields empirically better Shapley value estimates for various benchmark datasets, and we prove that these estimates are consistent. Moreover, we connect our approach to paired sampling (antithetic sampling), a ubiquitous modification to KernelSHAP that improves empirical accuracy. We prove that paired sampling outputs exactly the same Shapley value approximations as second-order PolySHAP, without ever fitting a degree 2 polynomial. To the best of our knowledge, this finding provides the first strong theoretical justification for the excellent practical performance of the paired sampling heuristic.

2512.20211 2026-05-14 cs.SD eess.AS eess.SP

Aliasing-Free Neural Audio Synthesis

Yicheng Gu, Junan Zhang, Chaoren Wang, Jerry Li, Zhizheng Wu, Lauri Juvela

AI总结 在神经音频合成中,现有模型在生成高质量音乐和人声演唱时常因非线性激活函数和上采样层引入严重的混叠伪影而表现不足。本文将可微分的抗混叠技术引入激活和上采样模块,提出Pupu-Vocoder和Pupu-Codec模型,有效提升了音频重建质量。实验表明,新模型在音乐、人声演唱和通用音频任务中优于现有系统,在语音任务上也保持了相近性能。

Comments Accepted by TASLP

详情
英文摘要

In neural audio synthesis, neural vocoders and codecs are models that reconstruct waveforms from acoustic and latent representations, which are essential to the resulting audio quality. While current models are capable of generating perceptually natural speech, they still struggle with high-fidelity music and singing voice synthesis, as severe aliasing artifacts are introduced by non-linear activation functions and upsampling layers in existing architectures. Although various anti-aliasing techniques have been proposed in digital signal processing, their integration into neural vocoders and codecs remains under-explored. This paper incorporates differentiable anti-aliasing techniques into the activation and upsampling modules to bridge this gap, and thus presents Pupu-Vocoder and Pupu-Codec. We build a test signal benchmark to evaluate the anti-aliased modules, and validate our proposed models on speech, singing voice, music, and audio. Experimental results show that Pupu-Vocoder and Pupu-Codec outperform existing systems on singing voice, music, and audio, while achieving comparable performance on speech. Demos, codes, and checkpoints are available at VocodexElysium.github.io/AliasingFreeNeuralAudioSynthesis/.

2512.16767 2026-05-14 cs.CV

Make-It-Poseable: Feed-forward Latent Posing Model for 3D Characters

Zhiyang Guo, Ori Zhang, Jax Xiang, Alan Zhao, Zhenxun Yuan, Wengang Zhou, Houqiang Li

AI总结 本文提出了一种名为 Make-It-Poseable 的新型前馈框架,用于解决3D角色姿态生成中的关键问题,如皮肤权重不准确、网格拓扑固定和姿态不匹配等。该方法将角色姿态生成重新定义为一种无需皮肤绑定的潜在空间变换问题,通过在紧凑的潜在表示上操作,实现了对目标姿态的高效重建。该框架结合了潜在姿态变换器、密集姿态表示和自适应补全模块,能够处理拓扑变化并展现出优异的零样本泛化能力,适用于多种形态的角色和3D创作任务。

Comments Project page: https://jasongzy.github.io/Make-It-Poseable/

详情
英文摘要

Posing 3D characters is a fundamental task in computer graphics. However, existing paradigms, ranging from traditional auto-rigging to recent pose-conditioned generative models, frequently struggle with inaccurate skinning weights, fixed mesh topologies, and poor pose conformance. These challenges have become particularly pronounced with the recent explosion of AI-generated 3D assets, which often exhibit flawed structures and fused geometry. To address these issues, we introduce Make-It-Poseable, a novel feed-forward framework that reformulates character posing as a skinning-free latent-space transformation problem. By decoupling shape deformation from the constraints of fixed mesh connectivity, our method directly operates on compact latent representations to reconstruct characters in target poses. To achieve this, our framework integrates a latent posing transformer for shape manipulation, a dense pose representation for fine-grained control, and an adaptive completion module optimized via a bipartite-matched latent loss to robustly handle topological changes. Extensive experiments demonstrate that our method significantly outperforms existing baselines in posing quality. Furthermore, our skeleton-agnostic design exhibits remarkable zero-shot generalization to diverse morphologies including quadrupeds and seamlessly supports various 3D authoring applications such as part replacement and refinement.

2512.10931 2026-05-14 cs.LG cs.CL

Asynchronous Reasoning: Training-Free Interactive Thinking LLMs

George Yakushev, Nataliia Babina, Masoud Vahid Dastgerdi, Vyacheslav Zhdanovskiy, Denis Kuznedelev, Alina Shutova, Max Ryabinin

AI总结 许多最先进的大型语言模型在回答问题前需要进行推理,但这种顺序交互方式限制了其在实时场景中的应用。本文提出了一种无需额外训练的方法,使具备推理能力的模型能够像人类一样异步进行思考、监听和输出。通过利用位置嵌入的特性,模型可以同时进行多任务处理,显著提升了响应速度和交互效率。

Comments Preprint, work in progress

详情
英文摘要

Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities, but it also makes them less interactive: given a new input, a model must stop thinking before it can respond. Real-world use cases such as voice-based or embodied assistants require an LLM agent to respond and adapt to additional information in real time, which is incompatible with sequential interactions. In contrast, humans can listen, think, and act asynchronously: we begin thinking about the problem while reading it and continue thinking while formulating the answer. In this work, we augment LLMs capable of reasoning to operate in a similar way without additional training. Our method uses the properties of positional embeddings to enable LLMs built for sequential generation to simultaneously think, listen, and write outputs. We evaluate our approach on math, commonsense, and safety reasoning: it allows models to generate accurate thinking-augmented answers while reducing time to first non-thinking token from minutes to ${\le}$ 5s and the overall delays by up to $12{\times}$.

2512.09972 2026-05-14 cs.LG cs.CL cs.NE

AP-BMM: Approximating Capability-Cost Pareto Sets of LLMs via Asynchronous Prior-Guided Bayesian Model Merging

Kesheng Chen, Yamin Hu, Zhenqian Zhu, Yiya Diao, Wenjian Luo

AI总结 在大型语言模型(LLMs)部署中,推理能力与推理成本之间的权衡是一个重要问题。本文提出了一种异步先验引导的贝叶斯模型合并方法(AP-BMM),通过层-wise合并策略,结合参数和推理激活差异来指导搜索过程,并利用异步优化提升计算效率。该方法在固定评估预算下,能够生成更高质量且覆盖范围更广的精度-成本帕累托前沿集,优于同步优化和传统模型级合并方法。

详情
英文摘要

Serving Large Language Models (LLMs) often requires choosing between stronger reasoning and lower inference cost. Model merging offers a practical way to build several models between a reasoning-oriented model and a cheaper base model, but common model-level merging methods usually control this trade-off with only one or two global knobs. We study this setting as a multi-objective optimization problem: instead of producing one merged model, the goal is to find a set of merged models that cover different accuracy--token-cost preferences. Layer-wise merging is more flexible because it can assign different merge weights to different Transformer layers. However, it introduces two practical challenges. First, the layer-wise search space is large, and existing methods often search it without using helpful signals from the source models. Second, LLM evaluations can take very different amounts of time, so synchronous batch optimization wastes GPU time while waiting for slow evaluations. We propose Asynchronous Prior-Guided Bayesian Model Merging (AP-BMM). AP-BMM uses parameter and reasoning-activation differences between the source models to suggest which layers should matter early in the search. It also uses an asynchronous Bayesian optimization loop that accounts for candidate models already being evaluated. A lightweight reranking step further spreads candidates across the accuracy--cost trade-off. Under fixed evaluation budgets, AP-BMM achieves stronger Pareto-set quality and broader trade-off coverage than synchronous layer-wise baselines and representative model-level merging baselines. Compared with the synchronous Bayesian baseline, it also reduces wall-clock time by improving GPU utilization. Code: https://github.com/MiLab-HITSZ/AP-BMM.

2511.17001 2026-05-14 cs.RO

Unify Robot Actions in Camera Frame

Sicheng Xie, Lingchen Meng, Zijie Diao, Haidong Cao, Zhiying Du, Shuyuan Tu, Jiaqi Leng, Qiuyue Wang, Mingsheng Li, Shuai Bai, Zuxuan Wu, Yu-Gang Jiang

AI总结 本文研究了跨机器人平台动作表示的一致性问题,提出了一种基于相机外参的统一动作表示方法,使单臂和双臂机器人等不同形态的机器人动作在相机坐标系下具有相同的几何语义。为了解决现有数据集缺乏相机外参标注的问题,作者提出了一个无需训练、跨机器人平台的标注方法CalibAll,通过从粗到细的校准策略,实现了高精度的相机外参估计,并生成标准化的动作表示。实验表明,基于相机帧动作的跨平台预训练在多个任务中取得了最先进的性能。

详情
英文摘要

Cross-embodiment robot learning requires a unified action representation with consistent semantics across robot platforms. Existing representations suffer from platform-specific inconsistencies, while current solutions either maintain embodiment-specific action heads or learn latent action spaces, without fundamentally resolving the mismatch. We propose to unify robot actions in the camera frame using camera extrinsics, so that actions share consistent geometric semantics across different robot embodiments, including both single-arm and bimanual robots. However, most existing datasets lack camera extrinsic annotations, and existing offline calibration methods either suffer from local minima or require robot-specific training data. To address this gap, we present CalibAll, a training-free, robot-independent annotation pipeline that estimates camera extrinsics for offline datasets and converts heterogeneous robot actions into standardized camera-frame actions. CalibAll follows a coarse-to-fine calibration strategy: temporal PnP provides a stable initialization, followed by differentiable rendering-based refinement for high precision. Beyond extrinsics, CalibAll produces standardized TCP-pose actions and auxiliary annotations. We apply CalibAll to 16 datasets across 4 robot platforms, producing approximately 97K calibrated data episodes. Downstream simulation and real-robot experiments show that cross-embodiment pretraining with camera-frame actions achieves state-of-the-art performance.

2511.09771 2026-05-14 cs.CV

STORM: Segment, Track, and Object Re-Localization from a Single Image

Yu Deng, Teng Cao, Hikaru Shindo, Quentin Delfosse, Jiahong Xue, Kristian Kersting

AI总结 STORM 是一种统一的框架,能够基于单张参考图像进行条件化的6D姿态估计与跟踪,具有较高的鲁棒性和较低的人工输入需求。该方法结合了分层空间融合注意力机制和基于BCE训练的跟踪验证器,能够在遮挡和快速运动等复杂场景下稳定恢复目标姿态。实验表明,STORM 在无需标注的情况下优于现有方法,并能有效应对严重遮挡和视角变化。

Comments 21 pages. Accepted at the 43rd International Conference on Machine Learning (ICML 2026); camera-ready version

详情
英文摘要

Accurate 6D pose estimation and tracking are core capabilities for physical AI systems, yet real-world deployment remains brittle and labor-intensive. Many pipelines rely on CAD models, manual masking, or per-object adaptation, and still fail under occlusion or fast motion without a principled way to recognize failure. We propose STORM, a unified framework for reference-conditioned 6D tracking that can operate from a single reference image, with minimal manual input and improved robustness. STORM combines: (i) Hierarchical Spatial Fusion Attention (HSFA), a task-driven reference-query fusion architecture that supports both single-reference and multi-reference conditioning and can optionally use vision-language semantic conditioning to resolve instance ambiguities; and (ii) a BCE-trained tracking verifier whose continuous compatibility logit is used as an energy-like score to detect drift and trigger automatic re-initialization. Experiments on LM-O and YCB-Video show that STORM improves annotation-free pose tracking accuracy over strong baselines and recovers reliably from severe occlusions and rapid viewpoint changes with minimal overhead.

2510.13385 2026-05-14 cs.LG

Probabilistic Prediction Markets with Intermittent Contributions

Michael Vitali, Pierre Pinson

AI总结 本文研究了在数据所有权和竞争利益限制下,如何通过预测市场机制促进多方协作进行准确预测的问题。提出了一种允许代理自主进出市场、适应动态环境并考虑历史表现的预测市场框架,采用鲁棒回归模型处理缺失提交,并设计了一种兼顾样本内与样本外性能的收益分配机制。实验表明,该设计在模拟和真实数据中均表现出良好的有效性和适应性。

详情
英文摘要

Although both data availability and the demand for accurate forecasts are increasing, collaboration between stakeholders is often constrained by data ownership and competitive interests. In contrast to recent proposals within cooperative game-theoretical frameworks, we place ourselves in a more general framework, based on prediction markets. There, independent agents trade forecasts of uncertain future events in exchange for rewards. We introduce and analyse a prediction market that (i) accounts for the historical performance of the agents, (ii) adapts to time-varying conditions, while (iii) permitting agents to enter and exit the market at will. The proposed design employs robust regression models to learn the optimal forecasts' combination whilst handling missing submissions. Moreover, we introduce a pay-off allocation mechanism that considers both in-sample and out-of-sample performance while satisfying several desirable economic properties. Case-studies using simulated and real-world data allow demonstrating the effectiveness and adaptability of the proposed market design.

2509.22123 2026-05-14 cs.CL

Multilingual Vision-Language Models, A Survey

Andrei-Alexandru Manea, Jindřich Libovický

AI总结 本文综述了能够处理多语言文本与图像的多语言视觉-语言模型,系统回顾了33个模型和23个基准测试,分析了编码器和生成式架构的发展趋势,并指出了语言中立性与文化适应性之间的关键矛盾。当前训练方法倾向于通过对比学习实现语言中立性,而文化适应性则依赖于多样化数据,多数评估基准优先考虑语义一致性,但近期研究开始引入文化相关的内容以弥补这一差距。

详情
英文摘要

This survey examines multilingual vision-language models that process text and images across languages. We review 33 models and 23 benchmarks, spanning encoder-only and generative architectures, and identify a key tension between language neutrality (consistent cross-lingual representations) and cultural awareness (adaptation to cultural contexts). Current training methods favor neutrality through contrastive learning, while cultural awareness depends on diverse data. Two-thirds of evaluation benchmarks use translation-based approaches prioritizing semantic consistency, though recent work incorporates culturally grounded content. We find discrepancies in cross-lingual capabilities and gaps between training objectives and evaluation goals.

2509.21543 2026-05-14 cs.RO

Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation

Jinbang Huang, Zhiyuan Li, Yuanzhao Hu, Zhanguang Zhang, Mark Coates, Xingyue Quan, Yingxue Zhang

AI总结 该研究提出了一种名为 Self-CriTeach 的框架,旨在通过大语言模型(LLM)的自我教学与自我批评机制,提升机器人规划能力。该方法利用 LLM 自主生成符号规划域,既用于生成大规模的机器人任务-计划对以进行监督微调,又作为结构化奖励函数提供密集反馈以增强强化学习。该统一训练流程显著提高了 LLM 的规划成功率、跨任务泛化能力,并降低了推理成本和对不完美逻辑状态的敏感性。

Comments International Conference on Machine Learning (ICML) 2026

详情
英文摘要

Large Language Models (LLMs) have shown strong promise for robotic task planning, particularly through the automatic generation of symbolic planning domains. However, prior work mainly treats generated domains as planning utilities. Such pipelines remain brittle under imperfect logical states and perception noise, while overlooking the potential of generated domains as scalable sources of reasoning supervision and structured reward signals. At the same time, reasoning LLMs depend on chain-of-thought (CoT) supervision, which is expensive to collect for robotic tasks, and reinforcement learning (RL) faces challenges in reward engineering. We propose Self-CriTeach, an LLM self-teaching and self-critiquing framework in which an LLM autonomously generates symbolic planning domains that serve a dual role: (1) In the self-teaching stage, generated domains are used to produce large-scale robotic planning problem--plan pairs, which are automatically converted into extended CoT trajectories for supervised fine-tuning. (2) In the self-critiquing stage, the same domains are reused as structured reward functions, providing dense feedback for reinforcement learning without manual reward engineering. This unified training pipeline yields a planning-enhanced LLM with higher planning success rates, stronger cross-task generalization, reduced inference cost, and improved resistance to imperfect logical states. GitHub Page: https://markli1hoshipu.github.io/Plan_LLM/

2509.20786 2026-05-14 cs.LG

LiLAW: Lightweight Learnable Adaptive Weighting to Learn Sample Difficulty & Improve Noisy Training

Abhishek Moturu, Muhammad Muzammil, Anna Goldenberg, Babak Taati

AI总结 本文提出了一种轻量可学习的自适应加权方法LiLAW,用于在存在噪声和数据异质性的场景下提升深度神经网络的训练效果。该方法通过三个全局可学习的标量参数动态调整每个样本的损失权重,根据样本难度(易、中、难)进行自适应调整,并在每次训练小批量后使用验证小批量进行一次梯度下降更新,无需干净的验证集。实验表明,LiLAW在多种数据集和噪声条件下均能有效提升模型准确率和AUROC,尤其在高噪声环境下表现突出,且计算高效,适用于资源受限的场景。

详情
英文摘要

Training deep neural networks with noise and data heterogeneity is a major challenge. We introduce Lightweight Learnable Adaptive Weighting (LiLAW), a method that dynamically adjusts the loss weight of each training sample based on its evolving difficulty, categorized as easy, moderate, and hard, using only three global learnable scalar parameters. LiLAW learns to adaptively prioritize samples by updating these parameters with a single gradient descent step on a validation mini-batch after each training mini-batch, without requiring a clean, unbiased validation set. Experiments across general and medical imaging datasets, several noise types and levels, loss functions, and architectures with and without pretraining, including linear probing and full fine-tuning, show that LiLAW consistently improves accuracy and AUROC, especially in higher-noise settings, without requiring excessive tuning. We also obtain state-of-the-art results incorporating synthetic and augmented data from SynPAIN, GAITGen, ECG5000, and improved fairness on the Adult dataset. LiLAW is lightweight, practical, and computationally efficient, making it an effective, scalable approach to boost generalization and robustness across diverse deep learning training setups, especially in resource-constrained settings.

2509.18993 2026-05-14 cs.LG

CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure

Boao Kong, Junzhu Liang, Yuxi Liu, Renjia Deng, Kun Yuan

AI总结 本文提出了一种名为CR-Net的参数高效的预训练框架,旨在解决当前低秩结构方法在模型性能、计算开销和激活内存节省方面的不足。CR-Net基于跨层激活残差具有低秩特性的发现,采用双路径架构,通过结合前一层输出与低秩差异高效重建层激活,从而在保持高秩信息的同时大幅减少参数量。实验表明,CR-Net在不同规模的模型(从60M到7B参数)上均优于现有低秩方法,且在计算资源和内存消耗方面表现更优。

Comments 32 pages. Accepted by ICLR 2026

详情
英文摘要

Low-rank architectures have become increasingly important for efficient large language model (LLM) pre-training, providing substantial reductions in both parameter complexity and memory/computational demands. Despite these advantages, current low-rank methods face three critical shortcomings: (1) compromised model performance, (2) considerable computational overhead, and (3) limited activation memory savings. To address these limitations, we propose Cross-layer Low-Rank residual Network (CR-Net), an innovative parameter-efficient framework inspired by our discovery that inter-layer activation residuals possess low-rank properties. CR-Net implements this insight through a dual-path architecture that efficiently reconstructs layer activations by combining previous-layer outputs with their low-rank differences, thereby maintaining high-rank information with minimal parameters. We further develop a specialized activation recomputation strategy tailored for CR-Net that dramatically reduces memory requirements. Extensive pre-training experiments across model scales from 60M to 7B parameters demonstrate that CR-Net consistently outperforms state-of-the-art low-rank frameworks while requiring fewer computational resources and less memory.

2509.13316 2026-05-14 cs.CL cs.LG

Do Activation Verbalization Methods Convey Privileged Information?

Millicent Li, Alberto Mario Ceballos Arroyo, Giordano Rogers, Naomi Saphra, Byron C. Wallace

AI总结 本文探讨了激活语言化方法是否能揭示大型语言模型(LLM)的内部工作机制。研究发现,现有方法可能更多地反映语言化模型自身的参数知识,而非目标模型的内部状态。实验表明,这些方法在无需访问目标模型内部信息的情况下也能表现良好,说明当前数据集不足以有效评估语言化方法的效果,亟需设计更严格的基准和实验控制来验证其真正的解释能力。

Comments ICML 2026. 41 pages, 23 tables, 6 figures

详情
英文摘要

Recent interpretability methods have proposed to translate LLM internal representations into natural language descriptions using a second verbalizer LLM. This is intended to illuminate how the target model represents and operates on inputs. But do such activation verbalization approaches actually provide privileged knowledge about the internal workings of the target model, or do they merely convey information about the inputs provided to it? We critically evaluate popular verbalization methods and datasets used in prior work and find that one can perform well on such benchmarks without access to target model internals, suggesting that these datasets are not ideal for evaluating verbalization methods. We then run controlled experiments which reveal that verbalizations often reflect the parametric knowledge of the verbalizer LLM that generated them, rather than the knowledge of the target LLM whose activations are decoded. Taken together, our results indicate a need for targeted benchmarks and experimental controls to rigorously assess whether verbalization methods provide meaningful insights into the operations of LLMs.

2508.09479 2026-05-14 cs.CV

SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images

Xuejun Huang, Xinyi Liu, Yi Wan, Zhi Zheng, Bin Zhang, Mingtao Xiong, Yingying Pei, Yongjun Zhang

AI总结 本文提出了一种名为SkySplat的新型自监督框架,旨在从多时相稀疏卫星图像中实现通用化的三维高斯点云重建。该方法通过将有理多项式系数(RPC)模型集成到通用3D高斯点云生成流程中,解决了现有方法在卫星图像处理中几何约束不足、瞬时物体干扰和辐射不一致等问题。SkySplat仅依赖RGB图像和鲁棒的相对高度监督,无需真实高度图即可实现高效且准确的重建,并在多个基准数据集上表现出优越的性能和跨数据集泛化能力。

Comments AAAI 2026. Code is available at https://github.com/NanCheng2001/SkySplat-main

详情
英文摘要

Three-dimensional scene reconstruction from sparse-view satellite images is a long-standing and challenging task. While 3D Gaussian Splatting (3DGS) and its variants have recently attracted attention for its high efficiency, existing methods remain unsuitable for satellite images due to incompatibility with rational polynomial coefficient (RPC) models and limited generalization capability. Recent advances in generalizable 3DGS approaches show potential, but they perform poorly on multi-temporal sparse satellite images due to limited geometric constraints, transient objects, and radiometric inconsistencies. To address these limitations, we propose SkySplat, a novel self-supervised framework that integrates the RPC model into the generalizable 3DGS pipeline, enabling more effective use of sparse geometric cues for improved reconstruction. SkySplat relies only on RGB images and radiometric-robust relative height supervision, thereby eliminating the need for ground-truth height maps. Key components include a Cross-Self Consistency Module (CSCM), which mitigates transient object interference via consistency-based masking, and a multi-view consistency aggregation strategy that refines reconstruction results. Compared to per-scene optimization methods, SkySplat achieves an 86 times speedup over EOGS with higher accuracy. It also outperforms generalizable 3DGS baselines, reducing MAE from 13.18 m to 1.80 m on the DFC19 dataset significantly, and demonstrates strong cross-dataset generalization on the MVS3D benchmark. The is available at https://github.com/NanCheng2001/SkySplat-main

2508.09320 2026-05-14 cs.LG cs.AI cs.CR

Exact Verification of Graph Neural Networks with Incremental Constraint Solving

Minghao Liu, Chia-Hsuan Lu, Marta Kwiatkowska

AI总结 该论文提出了一种用于图神经网络(GNN)的精确验证方法,旨在应对属性和结构扰动下的对抗攻击,确保模型的鲁棒性。该方法通过约束求解与边界收紧相结合,并利用求解器的增量求解能力提升效率,支持包括求和、最大值和平均值在内的三种聚合函数,其中后两种为首次应用。实验表明,该方法在多个真实数据集上表现出良好的实用性和优越的分类性能。

Comments Extended version of the paper accepted at FM 2026

详情
英文摘要

Graph neural networks (GNNs) are increasingly often employed in high-stakes applications, such as fraud detection or healthcare, but are susceptible to adversarial attacks. A number of techniques have been proposed to provide adversarial robustness guarantees, but support for commonly used aggregation functions in message-passing GNNs is lacking. In this paper, we develop an exact (sound and complete) verification method for GNNs to compute guarantees against attribute and structural perturbations that involve edge addition or deletion, subject to budget constraints. Our method employs constraint solving with bound tightening, and iteratively solves a sequence of relaxed constraint satisfaction problems while relying on incremental solving capabilities of solvers to improve efficiency. We implement GNNev, a versatile exact verifier for message-passing neural networks, which supports three aggregation functions -- sum, max and mean -- with the latter two considered here for the first time. Extensive experimental evaluation of GNNev on real-world fraud datasets (Amazon and Yelp) and biochemical datasets (MUTAG and ENZYMES) demonstrates its usability and effectiveness, as well as superior performance on node classification and competitiveness on graph classification compared to existing exact verification tools on sum-aggregated GNNs.

2507.12720 2026-05-14 cs.CL

FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Abraham Toluwase Owodunni, Orevaoghene Ahia, Sachin Kumar

AI总结 本文研究了语言模型在面对新数据分布时的适应性问题,指出传统子词分词器的固定性导致在分布外领域、未见过的语言或脚本中出现文本过度碎片化的问题。为此,作者提出了一种可学习的字节级分词器,通过预测输入字节序列的边界来实现自适应分词,并设计了FLEXITOKENS这一简化训练目标,显著提升了分词的灵活性。实验表明,该方法在多种多语言基准和生成任务中有效减少了分词过度碎片化,相比BPE等传统分词方法在分类和生成任务上提升了约10个百分点。

Comments Accepted to ACL (findings) 2026

详情
英文摘要

Adapting language models to new data distributions by simple finetuning is challenging. This is due to the rigidity of their subword tokenizers, which typically remain unchanged during adaptation. This inflexibility often leads to inefficient tokenization, causing overfragmentation of text in out-of-distribution domains, unseen languages, or scripts. In this work, we develop byte-level LMs with learnable tokenizers to make tokenization adaptive. Our models include a submodule that learns to predict boundaries given the input byte sequence, encoding it into variable-length segments. Most tokenizer-free methods train this boundary predictor using an auxiliary loss that enforces a fixed compression rate across the training corpus, introducing a new kind of rigidity. We propose FLEXITOKENS, a simplified training objective that enables significantly greater flexibility during adaptation. Evaluating across multiple multilingual benchmarks, morphologically diverse tasks, and domains, we demonstrate that FLEXITOKENS consistently reduces token over-fragmentation and achieves up to 10% point improvements on token classification and generative tasks compared to BPE and other gradient-based tokenizer baselines. We validate our findings using models of varying sizes, and our method demonstrates consistent improvements across scales. Code and data for our experiments will be released at https://github.com/skai-research/flexitokens

2507.09205 2026-05-14 cs.CL

From Curated Data to Scalable Models: Continual Pre-training of Dense and MoE Large Language Models for Tibetan

Lei Yang, Leiyu Pan, Bojian Xiong, Renren Jin, Shaowei Zhang, Yue Chen, Ling Shi, Jiang Zhou, Junru Wu, Zhen Wang, Jianxiang Peng, Juesi Xiao, Tianyu Dong, Zhuowen Han, Zhuo Chen, Yuqi Ren, Deyi Xiong

AI总结 该研究针对藏语这类低资源语言的大规模语言模型发展不足的问题,提出了一套完整的解决方案,包括构建72GB的高质量藏语语料库,并通过多语言持续预训练和指令调优对Qwen2.5-7B模型进行适配。为进一步提升模型容量,研究还将其扩展为50B-10B的专家混合架构,并构建了多个高质量评估数据集。实验表明,所提出的密集模型和MoE模型在多种任务上均优于现有同规模模型,为藏语及其它低资源语言的大模型研究提供了重要参考。

详情
英文摘要

Large language models (LLMs) have achieved remarkable success across a wide range of natural language processing tasks, yet their performance remains heavily biased toward high-resource languages. Tibetan, despite its cultural significance and large speaker population, is still substantially underrepresented. In this work, we present a comprehensive pipeline for advancing Tibetan language modeling through large-scale data curation and continual pre-training. We construct a 72 GB high-quality Tibetan corpus, the largest to date, and adapt Qwen2.5-7B through balanced multilingual continual pre-training with Tibetan, Chinese, and English, followed by multilingual instruction tuning. To further scale capacity efficiently, we extend the dense model to a 50B-A10B Mixture-of-Experts architecture. Due to the absence of standardized Tibetan benchmarks, we build multiple evaluation datasets via high-quality translation and human verification. Experimental results show that both dense and MoE models consistently outperform existing open-source and Tibetan-focused models of similar scale across diverse tasks. Our work advances Tibetan-centric LLM research and provides transferable insights for extending LLMs to other low-resource languages. We will release the model weights, evaluation benchmarks, and detailed data processing documentation in the follow-up.

2507.07316 2026-05-14 cs.LG cs.CR

AdeptHEQ-FL: Adaptive Homomorphic Encryption for Federated Learning of Hybrid Classical-Quantum Models with Dynamic Layer Sparing

Md Abrar Jahin, Taufikur Rahman Fuad, M. F. Mridha, Nafiz Fahad, Md. Jakir Hossen

AI总结 该研究提出了一种名为AdeptHEQ-FL的统一混合经典-量子联邦学习框架,旨在解决非独立同分布环境下模型性能、隐私保护与通信效率之间的平衡问题。该方法结合了混合CNN-PQC架构、基于差分隐私的精度加权聚合策略、选择性同态加密技术以及动态层级自适应冻结机制,实现了对敏感模型层的安全聚合与通信开销的最小化。实验表明,该方法在CIFAR-10等数据集上相比现有方法具有显著的精度提升和通信效率优势,验证了其在隐私保护与资源优化方面的有效性。

Comments Accepted in 1st International Workshop on ICCV'25 BISCUIT (Biomedical Image and Signal Computing for Unbiasedness, Interpretability, and Trustworthiness)

详情
Journal ref
1st International Workshop on BISCUIT at ICCV 2025
英文摘要

Federated Learning (FL) faces inherent challenges in balancing model performance, privacy preservation, and communication efficiency, especially in non-IID decentralized environments. Recent approaches either sacrifice formal privacy guarantees, incur high overheads, or overlook quantum-enhanced expressivity. We introduce AdeptHEQ-FL, a unified hybrid classical-quantum FL framework that integrates (i) a hybrid CNN-PQC architecture for expressive decentralized learning, (ii) an adaptive accuracy-weighted aggregation scheme leveraging differentially private validation accuracies, (iii) selective homomorphic encryption (HE) for secure aggregation of sensitive model layers, and (iv) dynamic layer-wise adaptive freezing to minimize communication overhead while preserving quantum adaptability. We establish formal privacy guarantees, provide convergence analysis, and conduct extensive experiments on the CIFAR-10, SVHN, and Fashion-MNIST datasets. AdeptHEQ-FL achieves a $\approx 25.43\%$ and $\approx 14.17\%$ accuracy improvement over Standard-FedQNN and FHE-FedQNN, respectively, on the CIFAR-10 dataset. Additionally, it reduces communication overhead by freezing less important layers, demonstrating the efficiency and practicality of our privacy-preserving, resource-aware design for FL.

2505.21238 2026-05-14 cs.CV

3D-UIR: 3D Gaussian for Underwater 3D Scene Reconstruction via Physics Based Appearance-Medium Decoupling

Jieyu Yuan, Yujun Li, Yuanlin Zhang, Chunle Guo, Xiongxin Tang, Ruixing Wang, Chongyi Li

AI总结 该论文提出了一种基于物理原理的3D高斯点云方法(3D-UIR),用于解决水下三维场景重建中的光-介质耦合问题。通过将物体外观与水介质效应解耦,并引入显式的介质嵌入表示,有效提升了场景的一致性和渲染质量。此外,该方法结合深度引导的优化策略,提高了几何重建的准确性,在水下场景的视图合成和场景恢复方面取得了显著改进。

Comments Accepted to IEEE TIP 2026. Project webpage: https://bilityniu.github.io/3D-UIR

详情
英文摘要

Novel view synthesis for underwater scene reconstruction presents unique challenges due to complex light-media interactions. Optical scattering and absorption in water body bring inhomogeneous medium attenuation interference that disrupts conventional volume rendering assumptions of uniform propagation medium. While 3D Gaussian Splatting (3DGS) offers real-time rendering capabilities, it struggles with underwater inhomogeneous environments where scattering media introduces artifacts and inconsistent appearance. In this study, we propose a physics-based framework that disentangles object appearance from water medium effects through tailored Gaussian modeling. Our approach introduces appearance embeddings, which are explicit medium representations for backscatter and attenuation, enhancing scene consistency. In addition, we propose a depth-guided optimization strategy that leverages pseudo-depth maps as supervision with depth regularization and scale penalty terms to improve geometric fidelity. By integrating the proposed appearance and medium modeling components via an underwater imaging model, our approach achieves both high-quality novel view synthesis and physically accurate scene restoration. Experiments demonstrate our significant improvements in rendering quality and restoration accuracy over existing methods. The project page is available at https://bilityniu.github.io/3D-UIR.

2505.15616 2026-05-14 cs.CV

LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models

Ruilin Yao, Bo Zhang, Jirui Huang, Xinwei Long, Yifang Zhang, Tianyu Zou, Yufei Wu, Shichao Su, Yifan Xu, Wenxi Zeng, Zhaoyu Yang, Guoyou Li, Shilan Zhang, Zichan Li, Yaxiong Chen, Shengwu Xiong, Peng Xu, Jiajun Zhang, Bowen Zhou, David Clifton, Luc Van Gool

AI总结 该研究提出了LENS,一个多层级的基准测试,用于评估多模态大语言模型在感知、理解和推理任务中的综合能力。LENS包含3400张当代图像和6万余个由人类撰写的问答,覆盖八个任务和十二种日常场景,支持从基础感知到复杂推理的多层次评估。该数据集通过丰富的标注和来自社交媒体的高质量图像,能够更真实地反映模型在现实场景中的表现,实验表明当前前沿模型在推理任务上的准确率均未超过60%。

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

Multimodal Large Language Models (MLLMs) have achieved significant advances in integrating visual and linguistic information, yet their ability to reason about complex and real-world scenarios remains limited. The existing benchmarks are usually constructed in the task-oriented manner without guarantee that different task samples come from the same data distribution, thus they often fall short in evaluating the synergistic effects of lower-level perceptual capabilities on higher-order reasoning. To lift this limitation, we contribute Lens, a multi-level benchmark with 3.4K contemporary images and 60K+ human-authored questions covering eight tasks and 12 daily scenarios, forming three progressive task tiers, i.e., perception, understanding, and reasoning. One feature is that each image is equipped with rich annotations for all tasks. Thus, this dataset intrinsically supports to evaluate MLLMs to handle image-invariable prompts, from basic perception to compositional reasoning. In addition, our images are manully collected from the social media, in which 53% were published later than Jan. 2025. We evaluate 15+ frontier MLLMs such as Qwen2.5-VL-72B, InternVL3-78B, GPT-4o and two reasoning models QVQ-72B-preview and Kimi-VL. These models are released later than Dec. 2024, and none of them achieve an accuracy greater than 60% in the reasoning tasks. Project page: https://github.com/Lens4MLLMs/lens. ICCV 2025 workshop page: https://lens4mllms.github.io/mars2-workshop-iccv2025/

2505.09760 2026-05-14 cs.RO cs.NE

Neural Associative Skill Memories for safer robotics and modelling human sensorimotor repertoires

Pranav Mahajan, Mufeng Tang, T. Ed Li, Ioannis Havoutis, Ben Seymour

AI总结 本文提出了一种名为神经关联技能记忆(Neural Associative Skill Memories)的框架,旨在提升机器人在复杂环境中的安全性和适应性。该方法通过自监督预测编码实现技能学习与表达的统一,无需显式选择技能即可根据上下文进行技能识别与执行,并具备故障检测能力。相比传统方法,该模型采用局部学习规则,实现了与生物运动准备相关的速度-精度权衡,为神经机器人学和人类感觉运动学习提供了新的计算视角。

详情
Journal ref
Neural Computation (2026) 38 (1): 1-27
英文摘要

Modern robots face challenges shared by humans, where machines must learn multiple sensorimotor skills and express them adaptively. Equipping robots with a human-like memory of how it feels to do multiple stereotypical movements can make robots more aware of normal operational states and help develop self-preserving safer robots. Associative Skill Memories (ASMs) aim to address this by linking movement primitives to sensory feedback, but existing implementations rely on hard-coded libraries of individual skills. A key unresolved problem is how a single neural network can learn a repertoire of skills while enabling fault detection and context-aware execution. Here we introduce Neural Associative Skill Memories (ASMs), a framework that utilises self-supervised predictive coding for temporal prediction to unify skill learning and expression, using biologically plausible learning rules. Unlike traditional ASMs which require explicit skill selection, Neural ASMs implicitly recognize and express skills through contextual inference, enabling fault detection across learned behaviours without an explicit skill selection mechanism. Compared to recurrent neural networks trained via backpropagation through time, our model achieves comparable qualitative performance in skill memory expression while using local learning rules and predicts a biologically relevant speed-accuracy trade-off during skill memory expression. This work advances the field of neurorobotics by demonstrating how predictive coding principles can model adaptive robot control and human motor preparation. By unifying fault detection, reactive control, skill memorisation and expression into a single energy-based architecture, Neural ASMs contribute to safer robotics and provide a computational lens to study biological sensorimotor learning.

2502.18917 2026-05-14 cs.AI cs.PL cs.SE

ClassInvGen: Class Invariant Synthesis using Large Language Models

Chuyue Sun, Viraj Agashe, Saikat Chakraborty, Jubi Taneja, Clark Barrett, David Dill, Xiaokang Qiu, Shuvendu K. Lahiri

AI总结 ClassInvGen 是一种利用大语言模型(LLM)生成类不变式的方法,旨在为如 C++ 等主流编程语言生成高质量的类不变式。该方法通过协同生成可执行的类不变式和测试输入,提升了不变式的准确性和完整性,并在实验中优于基于纯 LLM 和传统数据驱动的方法。研究还构建了一个包含标准 C++ 数据结构的基准测试集,并通过实际案例验证了其在真实代码库中的应用效果。

详情
英文摘要

Formal program specifications in the form of preconditions, postconditions, and class invariants have several benefits for the construction and maintenance of programs. They not only aid in program understanding due to their unambiguous semantics but can also be enforced dynamically (or even statically when the language supports a formal verifier). However, synthesizing high-quality specifications in an underlying programming language is limited by the expressivity of the specifications or the need to express them in a declarative manner. Prior work has demonstrated the potential of large language models (LLMs) for synthesizing high-quality method pre/postconditions for Python and Java, but does not consider class invariants. In this work, we describe ClassInvGen, a method for co-generating executable class invariants and test inputs to produce high-quality class invariants for a mainstream language such as C++, leveraging LLMs' ability to synthesize pure functions. We show that ClassInvGen outperforms a pure LLM-based technique to generate specifications (from code) as well as prior data-driven invariant inference techniques such as Daikon. We contribute a benchmark of standard C++ data structures along with a harness that can help measure both the correctness and completeness of generated specifications using tests and mutants. We also demonstrate its applicability to real-world code by performing a case study on several classes within a widely used and high-integrity C++ codebase.

2502.05157 2026-05-14 cs.LG cs.DS

Efficient distributional regression trees learning algorithms for calibrated non-parametric probabilistic forecasts

Quentin Duchemin, Guillaume Obozinski

AI总结 本文提出了一种高效的概率回归树学习算法,用于在加权区间分数(WIS)或连续排名概率分数(CRPS)损失函数下进行校准的非参数概率预测。通过引入最小最大堆、权重平衡二叉树和Fenwick树等数据结构,算法在计算效率上得到了显著提升。该方法不仅在数值实验中表现出与现有方法相当的性能,还继承了树模型的可解释性,适用于符合预测和组条件覆盖率保证的场景。

详情
英文摘要

The perspective of developing trustworthy AI for critical applications in science and engineering requires machine learning techniques that are capable of estimating their own uncertainty. In the context of regression, instead of estimating a conditional mean, this can be achieved by producing a predictive interval for the output, or to even learn a model of the conditional probability $p(y|x)$ of an output $y$ given input features $x$. While this can be done under parametric assumptions with, e.g. generalized linear model, these are typically too strong, and non-parametric models offer flexible alternatives. In particular, for scalar outputs, learning directly a model of the conditional cumulative distribution function of $y$ given $x$ can lead to more precise probabilistic estimates, and the use of proper scoring rules such as the weighted interval score (WIS) and the continuous ranked probability score (CRPS) lead to better coverage and calibration properties. This paper introduces novel algorithms for learning probabilistic regression trees for the WIS or CRPS loss functions. These algorithms are made computationally efficient thanks to an appropriate use of known data structures - namely min-max heaps, weight-balanced binary trees and Fenwick trees. Through numerical experiments, we demonstrate that the performance of our methods is competitive with alternative approaches. Additionally, our methods benefit from the inherent interpretability and explainability of trees. As a by-product, we show how our trees can be used in the context of conformal prediction and explain why they are particularly well-suited for achieving group-conditional coverage guarantees.

2501.10598 2026-05-14 cs.LG

Addressing Finite-Horizon MDPs via Low-Rank Tensor Value Approximation

Sergio Rozada, Jose Luis Orejuela, Antonio G. Marques

AI总结 本文研究了在有限时间范围的马尔可夫决策过程(MDPs)中,利用低秩张量近似值函数的方法学习最优策略的问题。针对有限时间MDPs中值函数非平稳带来的高维问题和样本复杂度高的挑战,作者提出将值函数建模为低秩张量,从而实现可扩展的表示形式,并在策略迭代框架下结合低秩策略评估与贪心策略改进,计算近似最优策略。该方法引入了基于优化的贝尔曼方程求解框架及块坐标下降算法,并在未知系统动态情况下通过采样轨迹估计值函数,实验表明该方法在计算效率和策略性能方面均具有优势。

详情
英文摘要

We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not stationary. This aggravates the challenges of high-dimensional MDPs, as they suffer from the curse of dimensionality and high sample complexity. To address these issues, we propose modeling the VFs of finite-horizon MDPs as low-rank tensors, enabling a scalable representation that renders the problem of learning optimal policies tractable. Our approach focuses on VF approximation within a policy iteration framework, where low-rank policy evaluation is combined with greedy policy improvement to compute near-optimal policies. We introduce an optimization-based framework for solving the Bellman equations with low-rank constraints, along with block-coordinate descent (BCD) and block-coordinate gradient descent (BCGD) algorithms, both with theoretical convergence guarantees. We further establish that bounded low-rank policy evaluation error translates into bounded policy improvement in the finite-horizon setting. For scenarios where the system dynamics are unknown, we adapt the proposed BCGD method to estimate the VFs using sampled trajectories. Numerical experiments further demonstrate that the proposed framework reduces computational demands in controlled synthetic scenarios and more realistic resource allocation problems, while achieving competitive policy performance in terms of attained returns.

2501.05982 2026-05-14 cs.LG eess.SP

Deep Variational Sequential Monte Carlo for High-Dimensional Observations

Wessel L. van Nierop, Nir Shlezinger, Ruud J. G. van Sloun

AI总结 本文提出了一种基于深度变分思想的序列蒙特卡洛方法,用于处理高维观测下的非线性状态空间系统。该方法通过神经网络参数化提议分布和状态转移分布,利用无监督变分SMC目标进行学习,从而提升粒子滤波的性能。实验表明,该方法在高维部分观测下对洛伦兹吸引子的跟踪任务中优于现有基准,并且在证据下界评估中显示出对后验分布更准确的建模能力。

详情
Journal ref
ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025
英文摘要

Sequential Monte Carlo (SMC), or particle filtering, is widely used in nonlinear state-space systems, but its performance often suffers from poorly approximated proposal and state-transition distributions. This work introduces a differentiable particle filter that leverages the unsupervised variational SMC objective to parameterize the proposal and transition distributions with a neural network, designed to learn from high-dimensional observations. Experimental results demonstrate that our approach outperforms established baselines in tracking the challenging Lorenz attractor from high-dimensional and partial observations. Furthermore, an evidence lower bound based evaluation indicates that our method offers a more accurate representation of the posterior distribution.

2410.22643 2026-05-14 cs.RO

An Overtaking Trajectory Planning Framework Based on Spatio-temporal Topology and Reachable Set Analysis Ensuring Time Efficiency

Wule Mao, Zhouheng Li, Entao Sun, Lei Xie, Hongye Su

AI总结 本文提出了一种基于时空拓扑和可达集分析的超车轨迹规划框架(SROP),旨在解决高速场景下传统分层规划方法易陷入局部最优和计算效率低的问题。该框架通过引入拓扑类别表示不同的超车行为,上层规划器进行时空搜索以生成多样化的初始路径,下层规划器利用可达集并行评估轨迹,从而解耦车辆运动学约束并加速计算。实验表明,SROP在轨迹平滑性和计算效率方面均有显著提升,并在F1TENTH仿真平台中验证了其在复杂场景下的实用性和鲁棒性。

详情
英文摘要

Generating overtaking trajectories in high-speed scenarios is typically addressed through hierarchical planning, which often suffers from local optima due to single initial solutions and low computational efficiency during numerical optimization. To overcome these limitations, this paper proposes a Spatio-temporal topology and Reachable set analysis enhanced Overtaking trajectory Planning framework (SROP). Specifically, by introducing topological classes to represent distinct overtaking behaviors, the upper-layer planner performs a spatio-temporal search to extract diverse initial paths, effectively preventing local optima. Subsequently, a lower-layer planner conducts parallel trajectory evaluation using reachable sets, which decouples vehicle kinematic constraints from the optimization process to ensure feasibility and significantly accelerate computation. Numerical experiments demonstrate that SROP improves trajectory smoothness by 66.8% and reduces computation time by 62.9% compared to state-of-the-art methods. Furthermore, by seamlessly integrating the method into the F1TENTH autonomous racing simulation platform, a 100-lap sensitivity analysis demonstrates high overtaking success rates in challenging scenarios, thereby validating its practical utility, real-time efficiency, and robustness.