arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1981
2605.14571 2026-05-15 cs.RO cs.LG

Let Robots Feel Your Touch: Visuo-Tactile Cortical Alignment for Embodied Mirror Resonance

Tianfang Zhu, Ning An, Rui Wang, Jiasi Gao, Qingming Luo, Anan Li, Guyue Zhou

AI总结 该研究旨在赋予机器人“镜像触觉”能力,使其能够通过观察他人的触觉动作,预测并模拟相应的触觉信号。研究提出了一种名为Mirror Touch Net的模型,通过多层次约束实现视觉与触觉表征在语义、分布和几何上的对齐,从而从RGB图像中预测机械手上的毫米级触觉信号。该方法不仅提升了跨模态感知的准确性,还为机器人实现具有共情能力的触觉交互提供了可解释的神经机制基础。

详情
英文摘要

Observing touch on another's body can elicit corresponding tactile sensations in the observer, a phenomenon termed mirror touch that supports empathy and social perception. This visuo-tactile resonance is thought to rely on structural correspondence between visual and somatosensory cortices, yet robotic systems lack computational frameworks that instantiate this principle. Here we demonstrate that cortical correspondence can be operationalized to endow robots with mirror touch. We introduce Mirror Touch Net, which imposes semantic, distributional and geometric alignment between visual and tactile representations through multi-level constraints, enabling prediction of millimetre-scale tactile signals across 1,140 taxels on a robotic hand from RGB images. Manifold analysis reveals that these constraints reshape visual representations into geometry consistent with the tactile manifold, reducing the complexity of cross-modal mapping. Extending this alignment framework to cross-domain observations of human hands enables tactile prediction and reflexive responses to observed human touch. Our results link a neural principle of visuo-tactile resonance to robotic perception, providing an explainable route towards anticipatory touch and empathic human-robot interaction. Code is available at https://github.com/fun0515/Mirror-Touch-Net.

2605.14570 2026-05-15 cs.CL

Uncertainty Quantification for Large Language Diffusion Models

Artem Vazhentsev, Vladislav Smirnov, David Li, Maxim Panov, Timothy Baldwin, Artem Shelmanov

AI总结 本文研究了大语言扩散模型(LLDMs)中的不确定性量化(UQ)问题,旨在提高其推理可靠性。针对现有方法与LLDMs的并行化特性不兼容的问题,作者提出了一种轻量、零样本的不确定性信号,基于去噪过程中的中间生成、标记重掩码动态和去噪复杂度。实验表明,该方法在保持高效推理的同时,能够有效检测生成内容中的幻觉,实现了计算开销与性能之间的良好平衡。

详情
英文摘要

Large Language Diffusion Models (LLDMs) are emerging as an alternative to autoregressive models, offering faster inference through higher parallelism. Similar to autoregressive LLMs, they remain prone to hallucinations, making reliable uncertainty quantification (UQ) crucial for safe deployment. However, existing UQ methods are fundamentally misaligned with this new paradigm: they assume autoregressive factorization or use expensive repeated sampling, negating the efficiency of LLDMs. In this work, we present the first systematic study of UQ for LLDMs and propose lightweight, zero-shot uncertainty signals derived from the iterative denoising process, leveraging intermediate generations, token remasking dynamics, and denoising complexity. We further adapt a state-of-the-art UQ method to LLDMs by combining masked diffusion likelihoods with trajectory-based semantic dissimilarity. We prove that expected trajectory dissimilarity lower bounds the masked diffusion training objective, which motivates its usage as an uncertainty score. Comprehensive experiments across three tasks, eight datasets, and two models show that our method achieves a great cost-performance trade-off: it approaches the strongest sampling-based baselines while incurring up to 100x lower computational overhead. Our work demonstrates that LLDMs can deliver both fast inference and reliable hallucination detection simultaneously.

2605.14569 2026-05-15 cs.CV

Bridging Brain and Semantics: A Hierarchical Framework for Semantically Enhanced fMRI-to-Video Reconstruction

Yujie Wei, Chenglong Ma, Jianxiong Gao, Chenhui Wang, Shiwei Zhang, Biao Gong, Shuai Tan, Hangjie Yuan, Hongming Shan

AI总结 本文提出了一种名为CineNeuron的层次化框架,旨在解决从功能性磁共振成像(fMRI)信号重建动态视频时存在的语义鸿沟问题。该方法受到人类大脑双通路处理机制的启发,通过自底向上的语义增强阶段和自顶向下的记忆整合阶段,分别将fMRI信号映射到丰富的语义空间,并动态融合历史数据中的相关记忆以提升视频重建质量。实验表明,CineNeuron在两个fMRI到视频的基准数据集上均优于现有最先进方法。

Comments Accepted to CVPR 2026

详情
英文摘要

Reconstructing dynamic visual experiences as videos from functional magnetic resonance imaging (fMRI) is pivotal for advancing the understanding of neural processes. However, current fMRI-to-video reconstruction methods are hindered by a semantic gap between noisy fMRI signals and the rich content of videos, stemming from a reliance on incomplete semantic embeddings that neither capture video-specific cues (e.g., actions) nor integrate prior knowledge. To this end, we draw inspiration from the dual-pathway processing mechanism in human brain and introduce CineNeuron, a novel hierarchical framework for semantically enhanced video reconstruction from fMRI signals with two synergistic stages. First, a bottom-up semantic enrichment stage maps fMRI signals to a rich embedding space that comprehensively captures textual semantics, image contents, action concepts, and object categories. Second, a top-down memory integration stage utilizes the proposed Mixture-of-Memories method to dynamically select relevant "memories" from previously seen data and fuse them with the fMRI embedding to refine the video reconstruction. Extensive experimental results on two fMRI-to-video benchmarks demonstrate that CineNeuron surpasses state-of-the-art methods across various metrics.

2605.14566 2026-05-15 cs.CV

SpectraFlow: Unifying Structural Pretraining and Frequency Adaptation for Medical Image Segmentation

Zhiquan Chen, Haitao Wang, Guowei Zou, Hejun Wu

AI总结 医学图像分割在数据稀缺的情况下仍面临挑战,传统方法常因标注不足导致泛化能力差和边界模糊。为此,本文提出 SpectraFlow 框架,结合结构感知的预训练与边界导向的解码,提升分割精度。该方法分为两阶段:第一阶段通过混合域均值流预训练,学习与结构相关的表示;第二阶段引入轻量解码器,结合注意力融合与频率方向卷积,增强边界细节与鲁棒性。实验表明,该方法在多个医学数据集上优于现有方法,尤其在低数据场景下表现突出。

详情
英文摘要

Medical image segmentation remains challenging in low-data regimes, where scarce annotations often yield poor generalization and ambiguous boundaries with missing fine structures. Recent self-supervised pretraining has improved transferability, but it often exhibits a texture bias. In contrast, accurate segmentation is inherently geometry-aware and depends on both topological consistency and precise boundary preservation. To address this problem, we propose a two-stage framework that couples structure-aware encoder pretraining with boundary-oriented decoding. In Stage-1, we aim to learn structure-aware representations for downstream segmentation in low-data regimes. To this end, we propose Mixed-Domain MeanFlow Pretraining, which aligns images and binary masks in a shared latent space through latent transport regression, where masks act as conditional structural guidance rather than prediction targets, making the pretraining task-agnostic. To further improve training stability under scarce supervision, we incorporate a lightweight Dispersive Loss to prevent representation collapse. In Stage-2, we fine-tune the pretrained encoder with a lightweight decoder that combines Direct Attentional Fusion for adaptive cross-scale gating and Frequency-Directional Dynamic Convolution for high-frequency boundary refinement under appearance variation. Experiments on ISIC-2016, Kvasir-SEG, and GlaS demonstrate consistent gains over state-of-the-art methods, with improved robustness in low-data settings and sharper boundary delineation.

2605.14561 2026-05-15 cs.AI

Prompt Segmentation and Annotation Optimisation: Controlling LLM Behaviour via Optimised Segment-Level Annotations

Devika Prasad, Luke Gerschwitz, Tong Li, Henry Xiao, Anjin Liu, Coco Wu, Anna Leontjeva, Luiz Pizzato

AI总结 本文提出了一种结构化的提示优化框架——提示分割与注释优化(PSAO),旨在提升与大型语言模型交互时的可控性和效率。该方法将提示分解为可解释的片段,并为每个片段添加人类可读的注释,以引导模型在生成响应时合理分配注意力并减少混淆。实验表明,优化后的片段级注释能够提升模型的推理准确性和一致性,同时保留原始提示作为优化候选以避免性能下降。该工作验证了片段级注释优化的可行性与潜力,但如何高效确定最优分割和注释仍是未来研究的方向。

详情
英文摘要

Prompt engineering is crucial for effective interaction with generative artificial intelligence systems, yet existing optimisation methods often operate over an unstructured and vast prompt space, leading to high computational costs and potential distortions of the original intent. We introduce Prompt Segmentation and Annotation Optimisation (PSAO), a structured prompt optimisation framework designed to improve prompt optimisation controllability and efficiency. PSAO decomposes a prompt into interpretable segments (e.g., sentences) and augments each with human-readable annotations (e.g., {not important}, {important}, {very important}). These annotations guide large language models (LLMs) in allocating focus and clarifying confusion during response generation. We formally define the segmentations and annotations and demonstrate that optimised segment-level annotations can lead to improved LLM responses, with the original prompt retained as a candidate in the optimisation space to prevent performance degradation. Empirical evaluations indicate that PSAO benefits from annotations in terms of improved reasoning accuracy and self-consistency. However, developing efficient methods for identifying optimal segmentations and annotations remains challenging and is reserved for future investigation. This work is intended as a proof of concept, demonstrating the feasibility and potential of segment-level annotation optimisation.

2605.14558 2026-05-15 cs.LG cs.AI cs.CL

Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy

Langzhou He, Junyou Zhu, Yue Zhou, Zhengyao Gu, Junhua Liu, Wei-Chieh Huang, Henry Peng Zou, David Wipf, Philip S. Yu, Qitian Wu

AI总结 本文研究了智能体强化学习中轨迹训练信号分配不均的问题,指出现有方法对轨迹中的每个token一视同仁,导致训练信号分配不合理。作者从能量模型视角出发,发现实际训练信号主要集中在动作token上,而非推理token,这一现象被称为“动作瓶颈”。为此,提出了一种简单有效的token重加权方法ActFocus,通过降低推理token的梯度权重并增强动作token的不确定性加权,显著提升了模型性能。

Comments Preprint

详情
英文摘要

Agentic reinforcement learning trains large language models using multi-turn trajectories that interleave long reasoning traces with short environment-facing actions. Common policy-gradient methods, such as PPO and GRPO, treat each token in a trajectory equally, leading to uniform credit assignment. In this paper, we critically demonstrate that such uniform credit assignment largely misallocates token-level training signals. From an energy-based modeling perspective, we show that token-level training signals, quantified by their correlations with reward variance of different rollouts sampled from a given prompt, concentrate sharply on action tokens rather than reasoning tokens, even though action tokens account for only a small fraction of the trajectory. We refer to this phenomenon as the Action Bottleneck. Motivated by this observation, we propose an embarrassingly simple token reweighting approach, ActFocus, that downweights gradients on reasoning tokens, along with an additional energy-based redistribution mechanism that further increases the weights on action tokens with higher uncertainty. Across four environments and different model sizes, ActFocus consistently outperforms PPO and GRPO, yielding final-step gains of up to 65.2 and 63.7 percentage points, respectively, without any additional runtime or memory cost.

2605.14556 2026-05-15 cs.AI

TeachAnything: A Multimodal Crowdsourcing Platform for Training Embodied AI Agents in Symmetrical Reality

Zidong Liu, Rongkai Liu, Yue Li, Zhenliang Zhang

AI总结 本文提出了一种名为TeachAnything的多模态众包平台,用于在对称现实(Symmetrical Reality)中训练具身智能体。该平台通过融合多模态示范信号的三阶段示范范式,支持跨场景、任务和具身形态的多样化示范数据采集。通过统一虚拟与物理交互,该系统为构建符合对称现实需求的具身智能体提供了实用的基础。

Comments 5 pages, 3 figures. Accepted as an IEEE VR 2026 Poster

详情
英文摘要

Symmetrical Reality (SR) is emerging as a future trend for human-agent coexistence, placing higher demands on agents to acquire human-like intelligence. It calls for richer and more diverse human guidance. We introduce a three-stage demonstration paradigm integrating multimodal demonstration signals. Building on this paradigm, we developed TeachAnything, a cloud-based, crowdsourcing-oriented demonstration platform with physics simulation capable of collecting diverse demonstration data across varied scenes, tasks, and embodiments. By unifying virtual and physical interactions through both methodological design and physics simulation, the system serves as a practical foundation for developing embodied agents aligned with Symmetrical Reality.

2605.14555 2026-05-15 cs.SD cs.AI

Break-the-Beat! Controllable MIDI-to-Drum Audio Synthesis

Shuyang Cui, Zhi Zhong, Qiyu Wu, Zachary Novack, Woosung Choi, Keisuke Toyama, Kin Wai Cheuk, Junghyun Koo, Yukara Ikemiya, Christian Simon, Chihiro Nagashima, Shusuke Takahashi

AI总结 本文提出了一种名为“Break-the-Beat!”的可控MIDI到鼓音效合成模型,旨在解决数字音乐制作中鼓循环音频生成缺乏精细控制的问题。该模型通过引入内容编码器和混合条件机制,对预训练的文本到音频模型进行微调,实现了根据参考音频生成具有特定音色的鼓音效。实验表明,该方法在音频质量、节奏对齐和节拍连贯性方面表现优异,为音乐制作人提供了一种高效、可控的创作工具。

详情
英文摘要

Current methods for creating drum loop audio in digital music production, such as using one-shot samples or resampling, often demand non-trivial efforts of creators. While recent generative models achieve high fidelity and adhere to text, they lack the specific control needed for such a task. Existing symbolic-to-audio research often focuses on single, tonal instruments, leaving the challenge of polyphonic, percussive drum synthesis unaddressed. We address this gap by introducing ``Break-the-Beat!,'' a model capable of rendering a drum MIDI with the timbre of a reference audio. It is built by fine-tuning a pre-trained text-to-audio model with our proposed content encoder and a effective hybrid conditioning mechanism. To enable this, we construct a new dataset of paired target-reference drum audio from existing drum audio datasets. Experiments demonstrate that our model generates high-quality drum audio that follows high-resolution drum MIDI, achieving strong performance across metrics of audio quality, rhythmic alignment, and beat continuity. This offer producers a new, controllable tool for creative production. Demo page: https://ik4sumii.github.io/break-the-beat/

2605.14553 2026-05-15 cs.LG cs.AI

Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits

Donghao Li, Chengshuai Shi, Weijuan Ou, Cong Shen, Jing Yang

AI总结 本文研究了多目标提示选择问题,旨在高效识别在多个性能指标下表现最优的提示。作者将问题建模为纯探索带宽框架,并引入了适用于结构化带宽的高效算法,提供了线性情况下的理论误差保证。实验表明,该方法在多种大语言模型上显著优于基线方法,为多目标提示优化提供了原理清晰且高效的解决方案。

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

Prompt engineering has become central to eliciting the capabilities of large language models (LLMs). At its core lies prompt selection -- efficiently identifying the most effective prompts. However, most prior investigations overlook a key challenge: the inherently multi-faceted nature of prompt performance, which cannot be captured by a single metric. To fill this gap, we study the multi-objective prompt selection problem under two practical settings: Pareto prompt set recovery and best feasible prompt identification. Casting the problem into the pure-exploration bandits framework, we adapt provably efficient algorithms from multi-objective bandits and further introduce a novel design for best feasible arm identification in structured bandits, with theoretical guarantees on the identification error in the linear case. Extensive experiments across multiple LLMs show that the bandit-based approaches yield significant improvements over baselines, establishing a principled and efficient framework for multi-objective prompt optimization.

2605.14551 2026-05-15 cs.LG

SeesawNet: Towards Non-stationary Time Series Forecasting with Balanced Modeling of Common and Specific Dependencies

Hao Li, Lu Zhang, Liu Chong, Yankai Chen, Pengyang Wang, Yingjie Zhou

AI总结 该论文提出了一种名为 SeesawNet 的新型网络架构,用于非平稳时间序列预测,旨在平衡对样本间共性依赖和个体特异性依赖的建模。其核心方法是引入自适应平稳-非平稳注意力机制(ASNA),通过分别从标准化序列和原始序列中提取共性与特异性依赖,并根据每个样本的非平稳特性进行自适应融合。实验表明,SeesawNet 在多个真实数据集上优于现有先进方法,展示了其在处理非平稳时间序列中的有效性。

Comments Accepted by IJCAI-ECAI 2026, the 35th International Joint Conference on Artificial Intelligence. Code is at https://github.com/dreamone-Lee/SeesawNet

详情
英文摘要

Instance normalization (IN) is widely used in non-stationary multivariate time series forecasting to reduce distribution shifts and highlight common patterns across samples. However, IN can over-smooth instance-specific structural information that is essential for modeling temporal and cross-channel heterogeneity. While prior methods further suppress distribution discrepancies or attempt to recover temporal specific dependencies, they often ignore a central tension: how to adaptively model common and instance-specific dependency based on each instance's non-stationary structures. To address this dilemma, we propose SeesawNet, a unified architecture that dynamically balances common and instance-specific dependency modeling in both temporal and channel dimensions. At its core is Adaptive Stationary-Nonstationary Attention (ASNA), which captures common dependencies from normalized sequences and specific dependencies from raw sequences, and adaptively fuses them according to instance-level non-stationarity. Built upon ASNA, SeesawNet alternates dedicated temporal and channel relationship modeling to jointly capture long-range and cross-variable dependencies. Extensive experiments on multiple real-world benchmarks demonstrate that SeesawNet consistently outperforms state-of-the-art methods.

2605.14550 2026-05-15 cs.LG

Multi-Dimensional Model Integrity and Responsibility Assessment Index and Scoring Framework

Phuc Truong Loc Nguyen, Thanh Hung Do, Truong Thanh Hung Nguyen, Hung Cao

AI总结 本文提出了一种名为MIRAI的统一评估框架,用于在高风险表格数据领域综合评估人工智能模型的完整性与责任性,涵盖可解释性、公平性、鲁棒性、隐私性和可持续性五个维度,并将其整合为单一评分。该框架通过标准化和方向对齐的维度得分,使得不同架构和计算复杂度的模型之间可以进行直接比较。实验表明,预测性能高的模型不一定在整体责任性方面表现更好,部分简单模型在多维度平衡上优于复杂的深度表格模型,为监管环境下的负责任模型选择提供了实用依据。

Comments Accepted to the 39th Canadian Conference on Artificial Intelligence (Canadian AI 2026)

详情
英文摘要

Artificial intelligence in high-stakes tabular domains cannot be evaluated by predictive performance alone, yet current practice still assesses explainability, fairness, robustness, privacy, and sustainability mostly in isolation. We propose the Model Integrity and Responsibility Assessment Index (MIRAI), a unified evaluation framework that measures tabular models across these five dimensions under a controlled comparison setting and aggregates them into a single score. MIRAI combines established metrics through normalized and direction-aligned dimension scores, which enables direct comparison across models with different architectural and computational profiles. Experiments on healthcare, financial, and socioeconomic datasets show that higher predictive performance does not necessarily imply better overall integrity and responsibility. In several cases, simpler models achieve a stronger cross-dimensional balance than more complex deep tabular architectures. MIRAI provides a compact and practical basis for responsible model selection in regulated settings.

2605.14548 2026-05-15 cs.CV

Local Spatiotemporal Convolutional Network for Robust Gait Recognition

Xiaoyun Wang, Cunrong Li, Wu Wang

AI总结 本文研究如何从视频序列中鲁棒地识别步态特征,以提升步态识别的准确性和稳定性。为解决现有方法对计算资源需求高、难以捕捉连续帧中内在运动模式的问题,作者提出了一种结构简洁但高效的局部时空卷积网络(LSTCN),通过引入全局双向空间池化机制和局部时空卷积层,使标准二维卷积网络能够有效提取步态的时空特征。该方法在降低计算复杂度的同时,提升了对视角变化、服装差异等干扰因素的鲁棒性。

详情
英文摘要

Gait recognition, as a promising biometric technology, identifies individuals through their unique walking patterns and offers distinctive advantages including non-invasiveness, long-range applicability, and resistance to deliberate disguise. Despite these merits, capturing the intrinsic motion patterns concealed within consecutive video frames remains challenging due to the complexity of video data and the interference of external covariates such as viewpoint changes, clothing variations, and carrying conditions. Existing approaches predominantly rely on either static appearance features extracted from individual silhouette frames or employ complex sequential models (\eg, LSTM, 3D convolutions) that demand substantial computational resources and sophisticated training strategies. To address these limitations, we propose a Local Spatiotemporal Convolutional Network (LSTCN), a structurally simple yet highly effective dual-branch architecture that endows standard two-dimensional convolutional networks with the capacity to extract temporal information. Specifically, we introduce a Global Bidirectional Spatial Pooling (GBSP) mechanism that reduces the dimensionality of gait tensors by decomposing spatial features into horizontal and vertical strip-based local representations, enabling the temporal dimension to participate in standard 2D convolution operations. Building upon this, we design a Local Spatiotemporal Convolutional (LSTC) layer that jointly processes temporal and spatial dimensions, allowing the network to adaptively learn strip-based gait motion patterns. We further extend this formulation with asymmetric convolution kernels that independently attend to the temporal, spatial, and joint spatiotemporal domains, thereby enriching the extracted feature representations.

2605.14546 2026-05-15 cs.LG

Discovering Physical Directions in Weight Space: Composing Neural PDE Experts

Pengkai Wang, Pengwei Liu, Yuanyi Wang, Guanyu Chen, Xingyu Ren, Xiaolong Li, Zhongkai Hao, Yuting Kong, Qixin Zhang, Dong Ni

AI总结 该研究探讨了在神经偏微分方程(PDE)算子中,通过微调共享算子来适应不同物理场景时,权重空间中是否形成了可复用的物理结构。研究发现,微调更新可以分解为共享的适应部分和与物理参数对齐的方向,从而揭示了权重空间中的物理方向。基于这一发现,作者提出了一种后处理方法Calibration-Conditioned Merge(CCM),能够根据物理元数据或初始观测信息,在物理方向上组合不同场景的专家模型,显著提升了模型在分布外场景下的预测性能。

详情
英文摘要

Recent advances in neural operators have made partial differential equation (PDE) surrogate modeling increasingly scalable and transferable through large-scale pretraining and in-context adaptation. However, after a shared operator is fine-tuned to multiple regimes within a continuous physical family, it remains unclear whether the resulting weight-space updates merely form isolated regime experts or reveal reusable physical structure. Starting from a shared family anchor, we fine-tune low- and high-regime endpoint experts and show that their updates can be separated into a family-shared adaptation and a direction aligned with the underlying physical parameter. This separation reinterprets endpoint experts as finite-difference probes of a local physical direction in weight space, explaining why static averaging can interpolate between regimes but attenuates endpoint-specific physics. Building on this perspective, we propose Calibration-Conditioned Merge (CCM), a post-hoc coordinate readout method for composing neural PDE experts along this physical direction. Given physical metadata, a calibrated coordinate mapping, or a short observed rollout prefix, CCM infers the target composition coordinate and deploys a single merged checkpoint for the remaining rollout. We evaluate CCM on the reaction--diffusion system, viscosity-parameterized two-dimensional Navier--Stokes equations, and radial dam-break dynamics. Across these benchmarks, CCM achieves its strongest gains in extrapolative regimes, reducing out-of-distribution rollout error relative to the family anchor by 54.2%, 42.8%, and 13.8%, respectively. Further experiments across FNO scales, a DPOT-style backbone, and ablations confirm that endpoint fine-tuning is not arbitrary checkpoint drift, but reveals a calibratable physical direction for training-free transfer across PDE regimes.

2605.14544 2026-05-15 cs.AI

Complacent, Not Sycophantic: Reframing Large Language Models and Designing AI Literacy for Complacent Machines

Federico Germani, Giovanni Spitale

AI总结 本文重新审视了大型语言模型(LLM)的行为特征,指出其常被描述为“谄媚”是概念上的误导,实际上应理解为“ complacency( complacent)”,即模型倾向于同意用户输入,这是由于训练数据、奖励信号和设计机制更偏好一致而非纠正。研究强调,模型本身并无谄媚的动机,其行为取决于开发者的意图和系统设计。因此,文章主张应通过提升AI素养教育,帮助用户识别和对抗模型可能强化的确认偏误。

详情
英文摘要

Large language models are often described as sycophantic, in the sense that they appear to flatter users or mirror their beliefs. We argue that this label is conceptually misleading: sycophancy implies motives and strategic intent, which LLMs do not possess. Their behaviour is better understood as complacency, a structural tendency to agree with user input because training data, reward signals and design favour agreement and reinforcement over correction. We argue that this distinction matters. Whether developers act sycophantically or not, models themselves never are sycophants; they can only be made more or less complacent. This reframing locates agency in developers and institutions, not in the model. Because complacent models reinforce users' prior beliefs, we argue that AI literacy educational approaches should particularly focus on strategies to counter confirmation bias.

2605.14543 2026-05-15 cs.LG cs.AI

RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation

Shuhao Chen, Weisen Jiang, Changmiao Wang, Xiaoqing Wu, Xuanren Shi, Yu Zhang, James T. Kwok

AI总结 RxEval 是一个用于评估大语言模型(LLM)处方推荐能力的处方级基准,旨在解决现有基准在细粒度药物推荐任务中的不足。该基准通过多选题形式,要求模型根据详细的患者信息和时间顺序的临床轨迹,从真实处方和生成的干扰选项中选择具体的药物-剂量-给药途径组合。实验表明,RxEval 对不同模型具有较高的区分度,反映出当前最先进模型在实际临床信息理解和推理方面仍存在挑战。

详情
英文摘要

Inpatient medication recommendation requires clinicians to repeatedly select specific medications, doses, and routes as a patient's condition evolves. Existing benchmarks formulate this task as admission-level prediction over coarse drug codes with multi-hot diagnostic and procedure code inputs, failing to capture the per-timepoint, information-rich nature of real prescribing. We propose RxEval, a prescription-level benchmark that evaluates LLM prescribing capability by multiple-choice questions: each question presents a detailed patient profile and time-ordered clinical trajectory, requiring selection of specific medication-dose-route triples from real prescriptions and patient-specific distractors generated via reasoning-chain perturbation. RxEval comprises 1,547 questions spanning 584 patients, 18 diagnostic categories, and 969 unique medications. Evaluation of 16 LLMs shows that RxEval is both challenging and discriminative: F1 ranges from 45.18 to 77.10 across models, and the best Exact Match is only 46.10%. Error analysis reveals that even frontier models may overlook stated patient information and fail to derive clinical conclusions.

2605.14542 2026-05-15 cs.AI

VerbalValue: A Socially Intelligent Virtual Host for Sales-Driven Live Commerce

Yuyan Chen

AI总结 该研究提出了一种名为VerbalValue的虚拟直播带货助手,旨在通过提升语言能力实现更高的销售转化率。其核心方法包括构建产品知识库与销售术语词典、收集并标注大量直播互动数据,以及基于这些数据微调大语言模型以生成更具共情力和说服力的回应。实验表明,该模型在信息性、事实准确性及观众互动方面均优于多个主流大模型,展现出显著的商业应用潜力。

Comments Accepted to the CVPR 2026 HiGen Workshop

详情
英文摘要

A skilled live-commerce host is not merely a narrator, but a sales agent who converts viewer curiosity into purchase intent through expert product knowledge, emotionally intelligent response tactics, and entertainment that serves as a vehicle for product exposure. Yet no existing AI system replicates this: conversational recommenders treat recommendation as a terminal act, while general-purpose LLMs hallucinate product claims and default to generic promotional templates that fail to engage or persuade. We present VerbalValue, a sales-conversion-oriented virtual host that turns exceptional verbal ability into real commercial value, built on three contributions. First, we construct a domain knowledge base of product specifications and a curated sales terminology lexicon that anchor product-related responses in verified expertise. Second, we collect and annotate 1,475 live-commerce interactions spanning diverse viewer intents. Third, we fine-tune a large language model on this data to deliver empathetic, commercially oriented responses, adapting to viewer intent through empathetic amplification, evidence-backed rebuttal, and humor-mediated deflection. Experiments against GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro, and other baselines demonstrate gains of 23% on informativeness and 18% on factual correctness, with consistent advantages in tactfulness and viewer engagement.

2605.14539 2026-05-15 cs.CL

Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards

Mengjie Ren, Jie Lou, Boxi Cao, Xueru Wen, Hongyu Lin, Xianpei Han, Le Sun, Xing Yu, Yaojie Lu

AI总结 本文提出了一种名为CIPO的纠正导向策略优化方法,旨在解决基于可验证奖励的强化学习(RLVR)中因稀疏奖励和弱信用分配导致的学习效率低下问题。该方法通过将模型自身失败轨迹转化为纠正导向的监督信号,无需依赖外部信息,从而提升模型的错误纠正能力和学习效果。实验表明,CIPO在多个数学推理和代码生成基准上显著优于现有方法,有效增强了模型的内在推理能力。

Comments Work on progress

详情
英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective paradigm for improving the reasoning capabilities of large language models. However, RLVR training is often hindered by sparse binary rewards and weak credit assignment, resulting in ambiguous optimization signals and underutilization of the useful information embedded in failed trajectories. To address this challenge, we propose Correction-Oriented Policy Optimization (CIPO), a simple and effective extension to RLVR that converts on-policy failed trajectories into correction-oriented supervision, without relying on any external signals. By jointly optimizing correction samples derived from the model's own failed attempts together with the standard RLVR objective, CIPO improves learning effectiveness while explicitly enhancing the model's ability to correct its own errors. Extensive experiments across 11 benchmarks spanning mathematical reasoning and code generation demonstrate that CIPO consistently and significantly outperforms strong baselines in both reasoning and correction performance. Moreover, CIPO yields stronger pass@K gains, indicating that it improves the model's intrinsic reasoning capacity rather than merely redistributing probability mass over existing correct answers.

2605.14537 2026-05-15 cs.AI

Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining

Robert Müller, Clemens Müller

AI总结 本文提出了一种名为 **Cattle Trade** 的多智能体基准,用于评估大语言模型在不完全信息、对抗交互和资源约束下的策略推理能力。该基准将拍卖、隐藏报价交易、谈判、虚张声势、对手建模和资源分配整合到一个持续50到60轮的长期博弈中,测试智能体在多重竞争目标下的综合决策能力。研究发现,战略一致性、资源纪律和阶段适应性比单一技能或消费总量更能影响模型表现,并揭示了大语言模型在博弈中常见的失败模式。

Comments malgai workshop at iclr 2026

详情
英文摘要

We introduce \textsc{Cattle Trade, a multi-agent benchmark for evaluating large language models (LLMs) as agents in strategic reasoning under imperfect information, adversarial interaction, and resource constraints. The benchmark combines auctions, hidden-offer trade challenges (TCs), bargaining, bluffing, opponent modeling, and resource allocation within a single long-horizon game lasting 50--60 turns. Unlike prior agent benchmarks that test these abilities in isolation, \textsc{Cattle Trade} evaluates whether agents integrate them across a competitive, multi-agent economic game with conflicting incentives. The benchmark logs every bid, TC offer, counteroffer, and card selection, enabling behavioural analysis beyond final scores or win rates. We evaluate seven cost-efficient language models and three deterministic code agents across 242 games. Strategic coherence, in particular spending efficiency, resource discipline, and phase-adaptive bidding, is associated with rank more strongly than spending volume or any single subskill. Two heuristic code agents outperform most tested LLMs, and behavioural traces surface recurring LLM failure modes including overbidding, self-bidding, bankrupt TC initiation, and weak opponent-state adaptation. Evaluating agentic competence requires benchmarks that test the joint deployment of multiple capabilities in multi-agent environments with conflicting incentives, uncertainty, and economic dynamics.

2605.14535 2026-05-15 cs.LG

Exploring Geographic Relative Space in Large Language Models through Activation Patching

Stef De Sabbata, Rahul Baiju, Stefano Mizzaro, Kevin Roitero

AI总结 本文探讨了大语言模型(LLM)在处理相对地理空间时的内部工作机制,通过激活值插补技术揭示其处理地理关系的潜在机制。研究旨在增进对LLM在地理任务中行为的理解,为安全有效地应用这类模型提供理论支持。

详情
英文摘要

The increased use of Large Language Models (LLMs) in geography raises substantial questions about the safety of integrating these tools across a wide range of processes and analyses, given our very limited understanding of their inner workings. In this extended abstract, we examine how LLMs process relative geographic space using activation patching, an emerging tool for mechanistic interpretability.

2605.14534 2026-05-15 cs.CV cs.AI cs.MM

PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

Fuhao Li, Shaofeng You, Jiagao Hu, Yu Liu, Yuxuan Chen, Zepeng Wang, Fei Wang, Daiguo Zhou, Jian Luan

AI总结 评估图像和视频中的物体移除效果仍然具有挑战性,因为该任务本质上是一对多的,而现有指标常与人类感知不一致。为解决这一问题,本文提出RC(移除一致性)指标,包括RC-S和RC-T,分别从空间和时间维度衡量移除区域的感知一致性,并构建了PROVE-Bench基准数据集以支持社区评估。实验表明,RC指标在多种图像和视频基准上表现出比现有方法更强的人类感知对齐能力。

Comments Project Page: https://xiaomi-research.github.io/prove/

详情
英文摘要

Evaluating object removal in images and videos remains challenging because the task is inherently one-to-many, yet existing metrics frequently disagree with human perception. Full-reference metrics reward copy-paste behaviors over genuine erasure; no-reference metrics suffer from systematic biases such as favoring blurry results; and global temporal metrics are insensitive to localized artifacts within edited regions. To address these limitations, we propose RC (Removal Coherence), a pair of perception-aligned metrics: RC-S, which measures spatial coherence via sliding-window feature comparison between masked and background regions, and RC-T, which measures temporal consistency via distribution tracking within shared restored regions across adjacent frames. To validate RC and support community benchmarking, we further introduce PROVE-Bench, a two-tier real-world benchmark comprising PROVE-M, an 80-video paired dataset with motion augmentation, and PROVE-H, a 100-video challenging subset without ground truth. Together, RC metrics and PROVE-Bench form the PROVE (Perceptual RemOVal cohErence) evaluation framework for visual media. Experiments across diverse image and video benchmarks demonstrate that RC achieves substantially stronger alignment with human judgments than existing evaluation protocols. The code for RC metrics and PROVE-Bench are publicly available at: https://github.com/xiaomi-research/prove/.

2605.14527 2026-05-15 cs.LG cond-mat.mtrl-sci physics.comp-ph

Lang2MLIP: End-to-End Language-to-Machine Learning Interatomic Potential Development with Autonomous Agentic Workflows

Wenwen Li, Yuki Orimo, Nontawat Charoenphakdee

AI总结 本文提出了一种名为Lang2MLIP的多智能体框架,旨在通过自然语言输入实现端到端的机器学习原子势能(MLIP)开发,降低非专家开发MLIP的门槛。该方法将MLIP开发过程建模为一个序列决策问题,由大型语言模型驱动的决策代理自动选择优化模型的动作,无需预设流程,且具备自我修正能力。实验在多组分固态电解质界面系统上验证了该方法的有效性,表明基于大语言模型的多智能体系统在自动化MLIP开发中具有广阔前景。

Comments 31 pages, 12 figures

详情
英文摘要

Developing machine learning interatomic potentials (MLIPs) for complex materials systems remains challenging because it requires expertise in atomistic simulations, machine learning, and workflow design, as well as iterative active learning procedures. Existing automated pipelines typically assume a fixed sequence of stages or depend on domain experts, which limits their adaptability to heterogeneous materials systems where the optimal curriculum is not known in advance. To lower the barrier to developing MLIPs for non-experts, we propose Lang2MLIP, a multi-agent framework that takes natural-language input and formulates end-to-end MLIP development as a sequential decision-making problem solved by large language models (LLMs). At each step, a decision-making agent observes the current dataset, model, evaluation results, and execution log, and then automatically selects an appropriate action to improve the model. This removes the need for a predefined pipeline and enables the agent to self-correct by revisiting earlier subsystems when new failures arise. We evaluate this approach on a solid electrolyte interphase (SEI) system with multiple components and interfaces. These results suggest that LLM-based multi-agent systems are a promising direction for automating MLIP development and making it more accessible to non-experts.

2605.14525 2026-05-15 cs.CV

From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper

Ling Li, Changjie Chen, Yuyan Wang, Jiaqing Lyu, Kenglun Chang, Yiyun Chen, Zhidong Deng

AI总结 在多视角三维人体姿态估计中,传统方法通常依赖于同一时刻不同视角的图像来预测某一时刻的姿态,忽略了相邻帧之间的丰富时序依赖关系。本文提出了一种新的输入方式——稀疏交错输入,通过在不同时间点采集不同视角的图像,使模型能够捕捉丰富的时空信息,从而提升性能。该方法不仅能够通过多相机提高输出姿态的帧率,突破单视角帧率限制,还能减少数据冗余。研究引入了DenseWarper模型,利用极线几何实现高效的时空热图交换,并在多个数据集上取得了优于传统密集输入方法的先进性能。

详情
英文摘要

In multi-view 3D human pose estimation, models typically rely on images captured simultaneously from different camera views to predict a pose at a specific moment. While providing accurate spatial information, this traditional approach often overlooks the rich temporal dependencies between adjacent frames. We propose a novel 3D human pose estimation input method: the sparse interleaved input to address this. This method leverages images captured from different camera views at various time points (e.g., View 1 at time $t$ and View 2 at time $t+δ$), allowing our model to capture rich spatio-temporal information and effectively boost performance. More importantly, this approach offers two key advantages: First, it can theoretically increase the output pose frame rate by N times with N cameras, thereby breaking through single-view frame rate limitations and enhancing the temporal resolution of the production. Second, using a sparse subset of available frames, our method can reduce data redundancy and simultaneously achieve better performance. We introduce the DenseWarper model, which leverages epipolar geometry for efficient spatio-temporal heatmap exchange. We conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Results demonstrate that our method, utilizing only sparse interleaved images as input, outperforms traditional dense multi-view input approaches and achieves state-of-the-art performance. The source code for this work is available at: https://github.com/lingli1724/DenseWarper-ICLR2026

2605.14521 2026-05-15 cs.LG

Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm

Yuxin Guo, Yihao Yue, Yunhao Ni, Yizhou Ruan, Jie Luo, Wenjun Wu, Lei Huang

AI总结 本文研究了如何在不改变模型功能的前提下,将深度神经网络中的层归一化(LN)替换为计算更高效的RMSNorm。核心方法是通过引入列中心约束(CCC)和基于列的权重中心化(CBWC),将LN的中心化操作折叠到前向线性层中,从而实现等效替换。该方法适用于多种深度网络结构,实验表明在多个任务中可实现2%到12%的推理加速,同时保持模型预测性能。

Comments 33 pages, 21 figures

详情
英文摘要

Layer normalization (LN) is a fundamental component in modern deep learning, but its per-sample centering and scaling introduce non-negligible inference overhead. RMSNorm improves efficiency by removing the centering operation, yet this may discard benefits associated with centering. This paper propose a framework to determine whether an LN in an arbitrary DNN can be replaced by RMSNorm without changing the model function. The key idea is to fold LN's centering operation into upstream general linear layers by enforcing zero-mean outputs through the column-centered constraint (CCC) and column-based weight centering (CBWC). We extend the analysis to arbitrary DNNs, define such LNs as foldable LNs, and develop a graph-based detection algorithm. Our analysis shows that many LNs in widely used architectures are foldable, enabling exact inference-time conversion and end-to-end acceleration of 2% to 12% without changing model predictions. Experiments across multiple task families further show that, when exact equivalence is partially broken in practical training settings, our method remains competitive with vanilla LN while improving efficiency.

2605.14518 2026-05-15 cs.CV cs.LG

ArcGate: Adaptive Arctangent Gated Activation

Avik Bhattacharya, Siddhant Dnyanesh Gole, Subhasis Chaudhuri, Alejandro C. Frery, Biplab Banerjee

AI总结 本文提出了一种新型的自适应反正切门控激活函数ArcGate,通过三阶段非线性变换生成多样化的激活形状,相比传统的固定形状激活函数(如ReLU、GELU等),其每个网络层包含七个可学习参数,能够根据特征层次和数据分布自主优化非线性特性。实验在多个遥感数据集上验证了ArcGate的优越性,尤其在噪声环境下表现出更强的鲁棒性,并揭示了其参数随网络深度变化的演化规律,表明ArcGate是一种适用于高分辨率地球观测任务的通用且自适应的激活函数。

详情
英文摘要

Activation functions are central to deep networks, influencing non-linearity, feature learning, convergence, and robustness. This paper proposes the Adaptive Arctangent Gated Activation (ArcGate) function, a flexible formulation that generates a broad spectrum of activation shapes via a three-stage non-linear transformation. Unlike conventional fixed-shape activations such as ReLU, GELU, or SiLU, ArcGate uses seven learnable parameters per layer, allowing the neural network to autonomously optimize its non-linearity to the specific requirements of the feature hierarchy and data distribution. We evaluate ArcGate using ResNet-50 and Vision Transformer (ViT-B/16) architectures on three widely used remote sensing benchmarks: PatternNet, UC Merced Land Use, and the 13-band EuroSAT MSI multispectral dataset. Experimental results show that ArcGate consistently outperforms standard baselines, achieving a peak overall accuracy of 99.67% on PatternNet. Most notably, ArcGate exhibits superior structural resilience in noisy environments, maintaining a 26.65% performance lead over ReLU under moderate Gaussian noise (standard deviation 0.1). Analysis of the learned parameters reveals a depth-dependent functional evolution, where the model increases gating strength in deeper layers to enhance signal propagation. These findings suggest that ArcGate is a robust and adaptive general node activation function for high-resolution earth observation tasks.

2605.14517 2026-05-15 cs.CL cs.AI

Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation

GAng Peng

AI总结 该研究提出了一种维度级意图保真度评估框架,用于更细致地评估大语言模型在结构形式和用户意图保持方面的表现。通过结构化提示消融实验,研究分析了2880个输出在三个语言、三个任务领域和六种模型中的表现,揭示了整体评分与维度意图缺陷之间的系统性差异。实验表明,仅依赖整体评估可能掩盖模型在具体意图上的不足,而维度级评估能更准确地反映模型质量,为用户特定任务的模型评估提供了重要补充。

Comments Preprint. 30 tasks, 3 languages, 6 LLMs, 2,880 outputs; includes human evaluation and structured prompt ablation

详情
英文摘要

Holistic evaluation scores capture overall output quality but do not distinguish whether a model reproduced the structural form of a user's request from whether it preserved the user's specific intent. We propose a dimension-level intent fidelity evaluation framework, applied here through a structured prompt ablation study across 2,880 outputs spanning three languages, three task domains, and six LLMs, that separately measures structural recovery and intent fidelity for each semantic dimension. This framework reveals a systematic structural-fidelity split: among Chinese-language outputs with complete paired scores, 25.7% received perfect holistic alignment scores (GA=5) while exhibiting measurable dimensional intent deficits; among English-language outputs, this proportion rose to 58.6%. Human evaluation confirmed that these split-zone outputs represent genuine quality deficits and that dimensional fidelity scores track human judgements more reliably than holistic scores do. A public-private decomposition of 2,520 ablation cells characterises when models successfully compensate for missing intent and when they fail, while proxy annotation distinguishes prior inferability from default recoverability. A weight-perturbation experiment shows that moderate misalignment is typically absorbed, whereas severe dimensional inversion is consistently harmful. These findings demonstrate that dimension-level intent fidelity evaluation is a necessary complement to holistic assessment when evaluating LLM outputs for user-specific tasks.

2605.14513 2026-05-15 cs.CV cs.AI

HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention

Xuzhe Zheng, Yuexiao Ma, Jing Xu, Xiawu Zheng, Rongrong Ji, Fei Chao

AI总结 本文提出了一种名为HASTE的训练-free视频扩散加速方法,旨在解决现有稀疏注意力机制在视频生成中因二次复杂度和固定阈值带来的效率与质量平衡问题。该方法通过引入头级自适应框架,包含时间掩码复用和误差引导的预算校准两个模块,有效减少了掩码预测开销并优化了各注意力头的稀疏性分配。实验表明,HASTE在保持视频质量的同时,显著提升了模型推理速度。

详情
英文摘要

Diffusion-based video generation has advanced substantially in visual fidelity and temporal coherence, but practical deployment remains limited by the quadratic complexity of full attention. Training-free sparse attention is attractive because it accelerates pretrained models without retraining, yet existing online top-$p$ sparse attention still spends non-negligible cost on mask prediction and applies shared thresholds despite strong head-level heterogeneity. We show that these two overlooked factors limit the practical speed-quality trade-off of training-free sparse attention in Video DiTs. To address them, we introduce a head-wise adaptive framework with two plug-in components: Temporal Mask Reuse, which skips unnecessary mask prediction based on query-key drift, and Error-guided Budgeted Calibration, which assigns per-head top-$p$ thresholds by minimizing measured model-output error under a global sparsity budget. On Wan2.1-1.3B and Wan2.1-14B, our method consistently improves XAttention and SVG2, achieving up to 1.93 times speedup at 720P while maintaining competitive video quality and similarity metrics.

2605.14500 2026-05-15 cs.SD cs.HC eess.IV

Physics-Based iOCT Sonification for Real-time Interaction Awareness in Subretinal Injection

Luis D. Reyes Vargas, Veronica Ruozzi, Andrea K. M. Ross, Shervin Dehghani, Michael Sommersperger, Koorosh Faridpooya, Mohammad Ali Nasseri, Merle Fairhurst, Nassir Navab, Sasan Matinfar

AI总结 本文提出了一种基于物理模型的实时iOCT声学反馈框架,用于提高视网膜下注射手术中的实时交互感知。该方法通过将iOCT获取的视网膜层信息映射为声音反馈,使外科医生能够通过听觉感知针头位置和视网膜形变,从而减轻视觉负担并提升手术精度。实验表明,该方法在视网膜层识别和形变检测方面显著优于现有方法,具有重要的临床应用潜力。

详情
英文摘要

Subretinal injection is a delicate vitreoretinal procedure requiring precise needle placement within the subretinal space while avoiding perforation of the retinal pigment epithelium (RPE), a layer directly beneath the target with extremely limited regenerative capacity. To enhance depth perception during cannula advancement, intraoperative optical coherence tomography (iOCT) offers high-resolution cross-sectional visualization of needle-tissue interaction; however, interpreting these images requires sustained visual attention alongside the en face microscope view, thereby increasing cognitive load during critical phases and placing additional demands on the surgeon's proprioceptive control. In this paper, we propose a structured, real-time sonification framework designed for extensible mapping of iOCT-derived anatomical features into perceptual auditory feedback. The method employs a physics-inspired acoustic model driven by segmented retinal layers from a stream of iOCT B-scans, with needle motion and injection-induced retinal layer displacements serving as excitation inputs to the sound model, enabling perception of tool position and retinal deformation. In a controlled user study (n=34), the proposed sonification achieved high retinal layer identification accuracy and robust detection of retinal deformation-related events, significantly outperforming a state-of-the-art baseline in overall event identification (83.4% vs. 60.6%, p < 0.001), with gains driven primarily by enhanced detection of injection-induced retinal deformation. Evaluation by experts (n=4) confirmed the clinical relevance and potential intraoperative applicability of the method. These results establish structured iOCT sonification as a viable complementary modality for real-time surgical guidance in subretinal injection.

2605.14497 2026-05-15 cs.LG cs.AI

ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization

Letian Yang, Xu Liu, Yiqiang Lu, Jian Liu, Weiqiang Wang, Shuai Li

AI总结 本文提出了一种名为 ROAD 的离线到在线强化学习框架,通过双层优化方法实现自适应数据混合,以解决离线数据与在线策略之间非平稳分布偏移的问题。该方法将数据选择建模为双层优化过程,外层优化策略性能,内层进行传统 Q 学习更新,并引入多臂老虎机机制实现动态数据回放。实验表明,ROAD 在多个数据集上均优于现有方法,无需人工调整即可实现更优的稳定性和长期性能。

Comments 20 pages, 9 figures, 7 tables. Accepted to IJCAI 2026

详情
英文摘要

Offline-to-online reinforcement learning harnesses the stability of offline pretraining and the flexibility of online fine-tuning. A key challenge lies in the non-stationary distribution shift between offline datasets and the evolving online policy. Common approaches often rely on static mixing ratios or heuristic-based replay strategies, which lack adaptability to different environments and varying training dynamics, resulting in suboptimal tradeoff between stability and asymptotic performance. In this work, we propose Reinforcement Learning with Optimized Adaptive Data-mixing (ROAD), a dynamic plug-and-play framework that automates the data replay process. We identify a fundamental objective misalignment in existing approaches. To tackle this, we formulate the data selection problem as a bi-level optimization process, interpreting the data mixing strategy as a meta-decision governing the policy performance (outer-level) during online fine-tuning, while the conventional Q-learning updates operate at the inner level. To make it tractable, we propose a practical algorithm using a multi-armed bandit mechanism. This is guided by a surrogate objective approximating the bi-level gradient, which simultaneously maintains offline priors and prevents value overestimation. Our empirical results demonstrate that this approach consistently outperforms existing data replay methods across various datasets, eliminating the need for manual, context-specific adjustments while achieving superior stability and asymptotic performance.

2605.14494 2026-05-15 cs.AI cs.LG

Learning Scenario Reduction for Two-Stage Robust Optimization with Discrete Uncertainty

Tianjue Lin, Jianan Zhou, Jieyi Bi, Yaoxin Wu, Wen Song, Zhiguang Cao, Jie Zhang

AI总结 本文研究了具有离散不确定性的两阶段鲁棒优化问题,该问题因计算复杂度高而难以求解。为解决这一问题,作者提出了一种基于图神经网络和Transformer的神经代理模型NeurPRISE,通过模仿学习从问题驱动的场景缩减方法PRISE中学习场景选择策略,从而在保证解质量的同时大幅提升计算效率。实验表明,NeurPRISE在多个两阶段鲁棒优化问题中表现出良好的性能和扩展性,并具备较强的零样本泛化能力。

详情
英文摘要

Two-Stage Robust Optimization (2RO) with discrete uncertainty is challenging, often rendering exact solutions prohibitive. Scenario reduction alleviates this issue by selecting a small, representative subset of scenarios to enable tractable computation. However, existing methods are largely problem-agnostic, operating solely on the uncertainty set without consulting the feasible region or recourse structure. In this paper, we introduce PRISE, a problem-driven sequential lookahead heuristic that constructs reduced scenario sets by evaluating the marginal impact of each scenario. While PRISE yields high-quality scenario subsets, each selection step requires solving multiple subproblems, making it computationally expensive at scale. To address this, we propose NeurPRISE, a neural surrogate model built on a GNN-Transformer backbone that encodes the per-scenario structure via graph convolution and captures cross-scenario interactions through attention. NeurPRISE is trained via imitation learning with a gain-aware ranking objective, which distills marginal gain information from PRISE into a learned scoring function for scenario ranking and selection. Extensive results on three 2RO problems show that NeurPRISE consistently achieves competitive regret relative to comprehensive methods, maintains strong calability with varying numbers of scenarios, and delivers 7-200x speedup over PRISE. NeurPRISE also exhibits strong zero-shot generalization, effectively handling instances with larger problem scales (up to 5x), more scenarios (up to 4x), and distribution shifts.

2605.14489 2026-05-15 cs.LG cs.SY eess.SY

A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures

Sergio Vanegas, Lasse Lensu, Fredy Ruiz

AI总结 本文提出了一种基于施瓦茨分解的权重投影方法,用于确保线性离散时间状态空间神经网络层的稳定性。该方法通过动态地将状态矩阵的实施瓦茨分解中的准上三角因子投影到最近的稳定矩阵,从而在保持模型精度的同时保证系统渐近稳定性。实验表明,该方法在合成线性系统上表现出与先进方法相当的识别精度和收敛速度,且在实际数据集的非线性神经网络结构中也有良好的训练表现。

Comments 32 pages, 13 figures. Source code at https://codeberg.org/sergiovaneg/SchurSS

详情
英文摘要

Building black-box models for dynamical systems from data is a challenging problem in machine learning, especially when asymptotic stability guarantees are required. In this paper, we introduce a novel stability-ensuring and backpropagation-compatible projection scheme based on the Schur decomposition for the state matrix of linear discrete-time state-space layers, as well as an alternative pre-factorized formulation of the methodology. The proposed methods dynamically project the quasi-triangular factor of the state matrix's real Schur decomposition onto its nearest stable peer, ensuring stable dynamics with minimal overparameterization. Experiments on synthetic linear systems demonstrate that the method achieves accuracy and convergence rates comparable to those of state-of-the-art stable-system identification techniques, despite a marginal increase in computational complexity. Furthermore, the lower weight count facilitates convergence during training without sacrificing accuracy in stacked neural-network architectures with static nonlinearities targeting real-world datasets. These results suggest that the Schur-based projection provides a numerically robust framework for identifying complex dynamics on par with the State of the Art while satisfying strict asymptotic-stability requirements.