arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.24593 2026-05-26 cs.CV

Self-supervised Dynamic Heterogeneous Degradation Modeling for Unified Zero-Shot Image Restoration

自监督动态异质退化建模用于统一零样本图像恢复

XiaoWan Hu, Jing Yang, HeNan Liu, HuaQiu Li, Mai Xu

AI总结提出统一物理零样本图像恢复框架，通过将异质退化重参数化为同质分布并引入动态质量细化策略，实现单/混合退化下的最优性能。

详情

AI中文摘要

零样本图像恢复提供了一种灵活的方式来处理各种退化，无需特定任务的训练。然而，现有方法通常依赖堆叠层或预训练特征来增强退化表达，同时忽略了物理一致的先验。不充分的退化提示在零样本扩散过程中带来了沉重的训练负担和高采样成本。此外，固定的推理轨迹在复杂损坏下往往收敛到次优解。我们观察到异质退化可以重参数化为一个最小物理一致参数集以实现紧凑表示。基于这一见解，我们首先提出一个统一的物理零样本图像恢复（UP-ZeroIR）框架，该框架将异质退化显式建模为同质全分布。该分布可以在潜在空间中直接优化，从而实现原则性的解探索和有效的提示适应。此外，我们引入了一种动态质量细化策略，自适应调整扩散轨迹以实现鲁棒的全局最优收敛。大量实验表明，我们的方法在单一和混合退化下均达到了最先进的性能。我们的代码可在 https://github.com/yangjinglyy/UP-ZeroIR 获取。

英文摘要

Zero-shot image restoration provides a flexible way to handle diverse degradations without task-specific training. However, existing methods typically rely on stacked layers or pre-trained features to enhance degradation expression, while overlooking physically consistent priors. The insufficient degradation prompts impose the heavy training burden and high sampling costs during zero-shot diffusion. Moreover, the fixed inference trajectory often collapses to suboptimal solutions under complex corruptions. We observe that heterogeneous degradations can be reparameterized into a minimal set of physically coherent parameters for compact representation. Based on this insight, we first propose a unified physical zero-shot image restoration (UP-ZeroIR) framework that explicitly models heterogeneous degradations into a homogeneous all-in-one distribution. The distribution can be optimized directly in the latent space, enabling principled solution exploration and effective prompt adaptation. Besides, we introduce a dynamic quality-refinement strategy that adaptively adjusts the diffusion trajectory for robust globally optimal convergence. Extensive experiments demonstrate that our method achieves state-of-the-art performance across both single and mixed degradations. Our code is available at https://github.com/yangjinglyy/UP-ZeroIR

URL PDF HTML ☆

赞 0 踩 0

2605.24592 2026-05-26 cs.RO

MuGen: Multi-Skill Generative Locomotion Controller for Humanoid Robots

MuGen: 人形机器人的多技能生成式运动控制器

Yusen Feng, Xiang Wang, Heyuan Yao, Zixi Kang, Xinyu Huo, Boyang Yu, Pengyun Qiu, Ruijie Zhao, Baoquan Chen, Libin Liu

AI总结提出MuGen框架，利用VQ-VAE和教师-学生策略蒸馏，从异构人类运动数据中学习生成式运动表示，使人形机器人能够执行多技能运动并模仿未见过的动作。

2605.24590 2026-05-26 cs.CV cs.LG stat.ML

Physen-Noise2Noise: Physics-Guided Self-Supervised Defocus Deblurring with Bias Correction under Low-Light Conditions

Physen-Noise2Noise: 低光条件下带偏差校正的物理引导自监督散焦去模糊

Ziyan Huang, Lang Wu, Hongji Wang, Yifei Liu, Dongliang Tang, Hongqiao Wang

AI总结提出一种基于物理模型的自监督散焦去模糊框架Physen-Noise2Noise，通过可学习噪声偏差参数和频域约束，在无干净参考图像的情况下联合校正偏差噪声并恢复高频细节。

Comments 14 pages

详情

AI中文摘要

低光、长曝光散焦去模糊由于同时存在严重模糊和复杂有偏噪声，仍然是一个具有挑战性的问题。现有方法通常依赖于简化的噪声假设，这限制了它们在真实成像条件下的有效性。在这项工作中，我们提出了Physen-Noise2Noise，一种由散焦成像物理模型引导的自监督去模糊框架，它利用有噪声的多帧观测，无需干净参考图像。与传统的基于Noise2Noise的方法假设零均值噪声不同，我们推导了散焦成像过程固有的频域约束，并通过可学习的噪声偏差参数将其纳入学习框架。此外，引入了一种多帧有噪初始化策略，在去模糊之前抑制复杂有偏噪声，为重建提供更稳定的起点。该公式显式建模有偏噪声，并在训练过程中实现联合偏差校正和高频细节恢复。此外，我们开发了一种预训练-微调变体，以增强在挑战性噪声条件下的鲁棒性和泛化能力。在模拟和真实数据集上的大量实验表明，所提出的方法在存在复杂有偏噪声的情况下，始终优于最先进的自监督散焦去模糊方法。

英文摘要

Low-light, long-exposure defocus deblurring remains a challenging problem due to the simultaneous presence of severe blur and complex biased noise. Existing methods typically rely on simplified noise assumptions, which limits their effectiveness under realistic imaging conditions. In this work, we propose Physen-Noise2Noise, a self-supervised deblurring framework guided by the physical model of defocus imaging, which leverages noisy multi-frame observations without requiring clean reference images. Unlike conventional Noise2Noise-based approaches that assume zero-mean noise, we derive a frequency-domain constraint inherent to the defocus imaging process and incorporate it into the learning framework via a learnable noise bias parameter. In addition, a multi-frame noisy initialization strategy is introduced to suppress complex biased noise prior to deblurring, providing a more stable starting point for reconstruction. This formulation explicitly models biased noise and enables joint bias correction and high-frequency detail recovery during training. Furthermore, we develop a pretrain-finetune variant to enhance robustness and generalization under challenging noise conditions. Extensive experiments on both simulation and real-world datasets demonstrate that the proposed method consistently outperforms state-of-the-art self-supervised approaches for defocus deblurring in the presence of complex biased noise.

URL PDF HTML ☆

赞 0 踩 0

2605.24588 2026-05-26 cs.AI cs.LG

HeartBeatAI: An Interpretable and Robust Deep Learning Framework for Multi-Label ECG Arrhythmia Detection

HeartBeatAI：用于多标签心电图心律失常的可解释且鲁棒的深度学习框架

Shubham Gupta, Nikhil Panwar, Partha Pratim Roy

AI总结提出HeartBeatAI框架，结合域泛化、多尺度特征聚合和临床可解释性，通过Squeeze-and-Excitation ResNet和多层浓度管道实现鲁棒的12导联心电图分类，在跨数据集评估中达到98%宏F1分数，但跨机构部署时罕见异常检测仍存在挑战。

详情

AI中文摘要

虽然深度学习增强了自动化心电图分析，但临床部署受到类别不平衡和泛化差距的阻碍。本文提出了HeartBeatAI，一个结合域泛化、多尺度特征聚合和临床可解释性的深度学习框架，用于鲁棒的12导联心电图分类。超越基于图像的范式，HeartBeatAI集成了一个Squeeze-and-Excitation ResNet来隔离诊断导联，以及一个多层浓度管道来捕捉宏观节律和微观形态异常。为了缓解域偏移，该框架采用了MixStyle正则化和标签平滑。通过使用源内和留一域外协议在四个大规模数据集上进行严格的基准测试，在源内条件下实现了高性能（98%宏F1分数）。然而，留一域外评估揭示了检测罕见异常时的显著退化，突显了跨机构部署中持续存在的挑战。

英文摘要

While Deep Learning (DL) enhances automated electrocardiogram (ECG) analysis, clinical deployment is hindered by class imbalance and the generalization gap. This paper presents HeartBeatAI, a deep learning framework combining domain generalization, multi-scale feature aggregation, and clinical explainability for robust 12-lead ECG classification. Moving beyond image-based paradigms, HeartBeatAI integrates a Squeeze-and-Excitation (SE) ResNet to isolate diagnostic leads alongside a Multi-Layer Concentration Pipeline to capture macro-rhythm and micro-morphological anomalies. To mitigate domain shift, the framework employs MixStyle regularization and Label Smoothing. Rigorous benchmarking across four large-scale datasets using intra-source and Leave-One-Domain-Out (LODO) protocols demonstrates high performance (98% Macro F1-score) under intra-source conditions. However, LODO evaluations reveal significant degradation in detecting rare anomalies, highlighting a persistent challenge in cross-institutional deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.24585 2026-05-26 cs.CL q-bio.NC

Word Class Representations Spontaneously Emerge from Successor Representations Trained on Natural Language

从自然语言训练的后继表示中自发涌现的词类表示

Mathis Immertreu, Achim Schilling, Thomas Kinfe, Patrick Krauss

AI总结本研究将强化学习中的后继表示（SR）框架应用于自然语言，通过训练神经网络预测未来词分布，发现无监督下词类（如名词、动词、形容词）的几何结构自发涌现，且预测时域影响结构层次。

详情

AI中文摘要

语言模型通常被训练来预测序列中的下一个词。这里，我们探索来自强化学习的另一种预测原则：后继表示（SR），它建模未来状态的期望折扣分布，而不是直接的下一个状态。我们将这一框架迁移到自然语言，并训练神经网络在多个时间视界上预测未来词分布，从而学习长程转移结构的表示。我们在WikiText-103（1.03亿词；2万词词汇）上训练深度残差神经网络，并使用KL散度将后继表示优化为概率分布。在没有显式语言监督的情况下，结构化语言表示自发涌现。训练后，学习到的空间相对于词性（POS）类别发展出清晰的几何组织：名词、动词和形容词变得可分离，并通过无监督聚类恢复。这种组织系统地依赖于预测视界：短视界产生最强的句法结构，而长视界逐渐整合更广泛的上下文和语义信息。在更精细的分辨率下，额外的可解释词汇子结构出现，揭示了主要词类内的连贯子类。这些发现表明，句法类别无需显式编码，而可能作为预测序列学习的结果出现。据我们所知，这项工作首次将后继表示系统应用于自然语言，并在强化学习、语言学和认知神经科学之间建立了概念桥梁。

英文摘要

Language models are typically trained to predict the next token in a sequence. Here, we explore an alternative predictive principle from reinforcement learning: Successor Representations (SRs), which model the expected discounted distribution of future states rather than the immediate next state. We transfer this framework to natural language and train neural networks to predict future word distributions across multiple temporal horizons, thereby learning representations of long-range transition structure. We train a deep residual neural network on WikiText-103 (103 million tokens; 20,000-word vocabulary) and optimize successor representations as probability distributions using KL divergence. Without explicit linguistic supervision, structured language representations emerge spontaneously. After training, the learned space develops a clear geometric organization with respect to part-of-speech (POS) categories: nouns, verbs, and adjectives become separable and recoverable through unsupervised clustering. This organization depends systematically on predictive horizon, with short horizons producing the strongest syntactic structure and longer horizons increasingly integrating broader contextual and semantic information. At finer resolutions, additional interpretable lexical substructure emerges, revealing coherent subclasses within major word categories. These findings suggest that syntactic categories need not be explicitly encoded but may arise as a consequence of predictive sequence learning. To our knowledge, this work provides the first systematic application of successor representations to natural language and establishes a conceptual bridge between reinforcement learning, linguistics, and cognitive neuroscience.

URL PDF HTML ☆

赞 0 踩 0

2605.24584 2026-05-26 cs.LG cs.AI

超声心动图特征与AI-ECG心力衰竭预测之间的关联

Elias Stenhede, Eivind Bjørkan Orstad, Torbjørn Omland, Henrik Schirmer, Arian Ranjbar

AI总结本研究通过回顾性分析8147例患者数据，发现AI-ECG预测的心力衰竭风险主要与整体纵向应变等收缩功能指标相关，且在射血分数保留的患者中也能捕捉舒张功能异常。

详情

AI中文摘要

人工智能心电图（AI-ECG）可以检测心力衰竭（HF），包括左心室射血分数（LVEF）未捕获的疾病，但模型预测背后的心脏表型仍不清楚。因此，我们研究了AI-ECG预测的HF风险是否与已确立的心肌功能障碍、重构和充盈压的超声心动图测量指标一致。我们回顾性分析了2023年1月1日至2025年6月1日期间在阿克什胡斯大学医院三天内同时接受心电图和超声心动图检查的8147名患者的数据。对所有心电图应用了先前验证的用于HF检测的AI-ECG模型。斯皮尔曼秩相关系数ρ量化了超声心动图参数与AI-ECG风险之间的关联。按性别和左心室射血分数（LVEF）进行了亚组分析。外部验证包括来自哥伦比亚大学欧文医学中心的36,286对心电图-超声心动图数据。整体纵向应变（GLS）显示出最强的相关性（ρ=0.57），其次是二尖瓣环平面收缩期位移（MAPSE）（ρ=-0.49）和LVEF（ρ=-0.45）。在LVEF>50%的患者中，GLS、MAPSE和舒张相关参数的相关性仍然显著。女性的左心室容积指数相关性较弱，而舒张指数在女性中的相关性比男性更强。生理学验证表明，AI-ECG的HF风险预测主要与收缩功能指标（特别是整体纵向应变）一致，同时也能捕捉LVEF保留患者的舒张相关异常。这种方法可能提高临床可解释性，并识别模型改进的机会。

英文摘要

Artificial intelligence-enabled electrocardiography (AI-ECG) can detect heart failure (HF), including disease not captured by left ventricular ejection fraction (LVEF), but the cardiac phenotypes underlying model predictions remain unclear. We therefore investigated whether AI-ECG-predicted HF risk aligns with established echocardiographic measures of myocardial dysfunction, remodelling, and filling pressures. We retrospectively analysed ECG and echocardiography data from 8147 patients who underwent both examinations within three days at Akershus University Hospital between 1 January 2023 and 1 June 2025. A previously validated AI-ECG model for HF detection was applied to all ECGs. Spearman's rank correlation $ρ$ quantified associations between echocardiographic parameters and AI-ECG risk. Subgroup analyses were performed by sex and left ventricular ejection fraction (LVEF). External validation included 36,286 ECG-echocardiography pairs from Columbia University Irving Medical Center. Global longitudinal strain (GLS) showed the strongest correlation ($ρ$=0.57), followed by mitral annular plane systolic excursion (MAPSE) ($ρ$=-0.49) and LVEF ($ρ$=-0.45). In patients with LVEF>50%, correlations remained substantial for GLS, MAPSE, and diastolic-related parameters. Volumetric left ventricular indices correlated less strongly in women, whereas diastolic indices showed stronger correlations in women than in men. Physiological validation showed that AI-ECG HF risk predictions align primarily with measures of systolic function, particularly global longitudinal strain, while also capturing diastolic-related abnormalities in patients with preserved LVEF. This approach may improve clinical interpretability and identify opportunities for model refinement.

URL PDF HTML ☆

赞 0 踩 0

2605.24573 2026-05-26 cs.CL

召唤神谕以屠之：利用大语言模型缓解金融回测中的前瞻偏差

Weixian Waylon Li, Mengyu Wang, Tiejun Ma

AI总结提出FinCAD方法，通过对抗性偏差发现和实体日期自适应规则，在不重新训练的情况下抑制大语言模型对历史结果的记忆，从而缓解金融回测中的参数化前瞻偏差。

详情

AI中文摘要

在历史金融数据上回测大语言模型（LLMs）是不可靠的，因为预训练在事件发生后截断。一个在2024年训练的LLM已经“知道”2018-2020年股票的走势。我们将这种失败命名为参数化前瞻偏差，并提出FinCAD，一种上下文感知解码的推理时适配方法，无需重新训练即可抑制LLM对历史结果的记忆。FinCAD结合了一个对抗性偏差发现流程，该流程学习一个模型特定的记忆激活先验提示，以及一个实体和日期自适应规则，该规则将CAD强度按（实体，日期）记忆程度缩放，使得惩罚在记忆的样本内日期触发，并在样本外衰减至零。在五个7-14B LLM和五只大盘股上，FinCAD在记忆日期上将样本内回测收益削减高达-67.1%，同时将2025年样本外收益保持在$8K以内，夏普比率在基线的0.10以内，并保持通用推理能力在1.7分以内。在十一个模型的排行榜上，它将样本内/样本外Spearman相关性从+0.779提升至+0.846，恢复了真正预测样本外表现的排名。

英文摘要

Backtesting large language models (LLMs) on historical financial data is unreliable because pre-training cuts off after the events happened. An LLM trained in 2024 already "knows" which way 2018-2020 stocks moved. We name this failure parametric look-ahead bias and propose FinCAD, an inference-time adaptation of Context-Aware Decoding that suppresses an LLM's memory of historical outcomes without retraining. FinCAD pairs an adversarial bias-discovery pipeline that learns a model-specific memory-activating prior prompt with an entity- and date-adaptive rule that scales the CAD strength to per-(entity, date) memorisation, so the penalty fires on memorised in-sample dates and decays to zero out-of-sample. Across five 7-14B LLMs and five mega-cap equities, FinCAD cuts in-sample backtest returns by up to -67.1% on memorised dates while leaving 2025 out-of-sample returns within $8K and Sharpe within 0.10 of baseline, and preserves general-purpose reasoning within 1.7 pts. On an eleven-model leaderboard, it raises the in-sample / out-of-sample Spearman correlation from +0.779 to +0.846, recovering rankings that genuinely predict out-of-sample performance.

URL PDF HTML ☆

赞 0 踩 0

2605.24562 2026-05-26 cs.CV cs.AI

PALoRA: 投影自适应LoRA以保持大型语言模型的推理能力

Mustafa Hayri Bilgin, Mariam Barry, Albert Bifet, Azzedine Idir Ait Said, Soumya Banerjee

AI总结提出PALoRA框架，通过奇异值微调（SVF）识别推理关键成分，并在正交约束下使用LoRA注入知识，以在保持推理能力的同时高效更新事实知识。

详情

AI中文摘要

高效地用新的或演化的事实知识更新大型语言模型（LLMs）仍然是一个核心挑战，因为即使是参数高效的适应也可能侵蚀先前获得的推理能力。这种紧张关系反映了可塑性-稳定性困境：模型必须吸收新知识，同时保留技能关键的表征。在这项工作中，我们通过多层感知器权重矩阵的谱结构研究这种权衡。我们在理论和实验上都表明，推理所需的信息不仅局限于主导奇异方向，而是分布在奇异谱上。受此观察启发，我们引入了PALoRA，一个用于减少干扰的知识注入的两阶段框架。PALoRA首先在推理数据集上训练一个奇异值微调（SVF）专家，并使用其学习的奇异缩放向量作为冻结的几何探针，以识别对目标技能关键的成分。然后，它在结构正交约束下使用低秩适应（LoRA）执行事实知识注入，确保更新避免已识别的技能相关子空间。在Llama 3.1 8B和Mistral 7B上，以及在数学、编码和科学推理基准测试中，PALoRA平均保留了SVF专家95%的推理性能，同时保持了竞争性的事实召回。与先前的谱参数高效微调（PEFT）方法相比，它持续提高了技能保留，同时增加了不到0.006%的参数开销。

英文摘要

Efficiently updating Large Language Models (LLMs) with new or evolving factual knowledge remains a central challenge, as even parameter-efficient adaptation can erode previously acquired reasoning abilities. This tension reflects a plasticity-stability dilemma: models must incorporate new knowledge while preserving skill-critical representations. In this work, we study this trade-off through the spectral structure of multilayer perceptron weight matrices. We show, both theoretically and empirically, that information essential for reasoning is not localized only in dominant singular directions, but is instead distributed across the singular spectrum. Motivated by this observation, we introduce PALoRA, a two-stage framework for knowledge injection with reduced interference. PALoRA first trains a Singular Value Fine-Tuning (SVF) expert on a reasoning dataset and uses its learned singular scaling vector as a frozen geometric probe to identify components that are critical for the target skill. It then performs factual knowledge injection with Low-Rank Adaptation (LoRA) under a structural orthogonality constraint, ensuring that updates avoid the identified skill-relevant subspace. Across Llama 3.1 8B and Mistral 7B, and across mathematical, coding, and scientific reasoning benchmarks, PALoRA preserves on average 95% of the SVF expert's reasoning performance while maintaining competitive factual recall. It consistently improves skill retention over prior spectral Parameter-Efficient Fine-Tuning (PEFT) methods while adding less than 0.006% parameter overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.24548 2026-05-26 cs.LG math.PR

Deep ZakaiJ: Structured Filtering for Jump-Diffusion Time Series Forecasting

Deep ZakaiJ：用于跳跃扩散时间序列预测的结构化滤波

Yan Leng, Thibaut Mastrolia, Hao Wang

AI总结提出Deep ZakaiJ模型，将Zakai非线性滤波方程嵌入神经编码器-解码器架构，通过Strang分裂实现隐状态信念更新，用于部分观测的跳跃扩散系统，在合成、金融和海洋数据集上改进了分布预测并保持点精度竞争力。

详情

AI中文摘要

由未观测隐状态驱动的时间序列经常表现出突然的跳跃不连续性，其时间和幅度无法仅从观测历史预测。经典跳跃扩散模型提供了严谨的数学框架，但假设刚性参数形式，而最近的神经跳跃模型在完全观测轨迹上操作，不推断控制动力学的隐状态。我们提出 extit{Deep ZakaiJ}，一种用于部分观测跳跃扩散系统的隐状态模型，将Zakai非线性滤波方程嵌入神经编码器-解码器架构。编码器通过Strang分裂递归更新隐状态的信念，分为三个可解释的子步骤：先验传播、扩散创新和跳跃创新，产生精确滤波演化的可微一阶精确近似。解码器是一个结构化跳跃扩散模型，明确以滤波信念为条件，保持连续动力学和不连续冲击之间的分离。在合成、金融和海洋数据集上， extit{Deep ZakaiJ}改善了分布预测，同时保持点精度竞争力，实现了校准的预测区间，并在合成和定性案例研究中恢复了可解释的隐结构。

英文摘要

Time series driven by unobserved latent states frequently exhibit abrupt jump discontinuities whose timing and magnitude cannot be predicted from observed history alone. Classical jump-diffusion models offer a principled mathematical framework but assume rigid parametric forms, while recent neural jump models operate on fully observed trajectories without inferring the hidden states that govern the dynamics. We propose \textit{Deep ZakaiJ}, a latent-state model for partially observed jump-diffusion systems that embeds the Zakai nonlinear filtering equation into a neural encoder--decoder architecture. The encoder recursively updates a belief over the latent state via Strang splitting into three interpretable substeps: prior propagation, diffusion innovation, and jump innovation, yielding a differentiable, first-order-accurate approximation of the exact filtering evolution. The decoder is a structured jump-diffusion model explicitly conditioned on the filtered belief, preserving the separation between continuous dynamics and discontinuous shocks. On synthetic, financial, and oceanographic datasets, \textit{Deep ZakaiJ} improves distributional forecasts while remaining competitive in point accuracy, achieving calibrated predictive intervals and recovering interpretable latent structure in synthetic and qualitative case studies.

URL PDF HTML ☆

赞 0 踩 0

2605.24547 2026-05-26 cs.LG

DemoEvolve：利用演示克服智能体框架演化中的稀疏反馈

Lirong Che, Yuzhe yang, Peiwen lin, Chuang wang, Xueqian wang, Jian su

AI总结提出DemoEvolve方法，通过人类演示引导框架演化，解决长时域随机环境中自生成轨迹因稀疏反馈和高方差导致的脆弱性问题，在Liar's Dice和Balatro任务中验证了其有效性。

详情

AI中文摘要

智能体框架演化通过修改冻结语言模型周围的可执行结构来改进它们。我们将这一范式研究为一种样本高效的快速适应形式：智能体无需更新模型权重，而是通过改变其外部框架来获取任务特定能力，同时保留基础模型的通用能力。先前的工作表明，自生成轨迹可以支持框架搜索，暗示智能体可以通过练习获得新的任务能力。然而，在长时域随机环境中，自我练习变得脆弱：奖励稀疏，结果方差高，且失败难以归因于具体的框架机制。我们引入了DemoEvolve，一种基于演示引导的框架演化方法。当仅依赖奖励的搜索过于宽泛和嘈杂时，胜任的人类轨迹作为编码提议者的专家参考经验，指导框架级别的诊断和编辑。在Liar's Dice上的实验表明，当回合短且失败可归因时，自轨迹演化可以工作。相比之下，Balatro暴露了更困难的长时域随机场景，其中自轨迹演化被稀疏反馈和候选选择噪声误导，而仅靠教程式文本知识无法带来稳定的改进。在相同的有限预算下，DemoEvolve产生了更有效和可审计的框架编辑，并实现了更好的性能。总体而言，演示使稀疏反馈的框架演化更具可诊断性、可定位性和稳定性。

英文摘要

Agent harness evolution improves frozen language-model agents by modifying the executable structures around them. We study this paradigm as a form of sample-efficient fast adaptation: instead of updating model weights, an agent can acquire task-specific competence by changing its external harness, while leaving the base model's general capabilities intact. Prior work shows that self-generated rollouts can support harness search, suggesting that agents may acquire new task competence through practice. Yet in long-horizon stochastic environments, self-practice becomes fragile: rewards are sparse, outcomes are high-variance, and failures are hard to attribute to concrete harness mechanisms. We introduce DemoEvolve, a demonstration-bootstrapped approach to harness evolution. When reward-only search is too broad and noisy, competent human trajectories serve as expert reference experience for the coding proposer, guiding harness-level diagnosis and editing. Experiments on Liar's Dice show that self-rollout evolution can work when episodes are short and failures are attributable. In contrast, Balatro exposes a harder long-horizon stochastic regime, where self-rollout evolution is misled by sparse feedback and candidate-selection noise, while tutorial-like textual knowledge alone does not yield stable improvement. Under the same limited budget, DemoEvolve produces more effective and auditable harness edits and achieves better performance. Overall, demonstrations make sparse-feedback harness evolution more diagnosable, localizable, and stable.

URL PDF HTML ☆

赞 0 踩 0

2605.24534 2026-05-26 cs.CL

NudgeVAD: 通过FiLM残差的语言引导端到端驾驶

Chieh-Chi Yang, Yu-Hsiang Chen, Yi-Ting Chen

AI总结提出NudgeVAD框架，利用语言作为校准的微调信号，通过恒等初始化的FiLM和零初始化残差头，在命令不可靠时显著提升驾驶轨迹预测性能。

Comments Technical report for the doScenes Instructed Driving Challenge, CVPR 2026 DriveX Workshop. 1st place in the Ablation track

详情

AI中文摘要

自然语言指令有望实现可控的端到端驾驶，但当规划器已经接收到可靠的高级命令时，其优势可能被掩盖。我们提出NudgeVAD，一个冻结规划器残差框架，利用语言作为对VAD轨迹的校准微调。通过恒等初始化的FiLM和零初始化的残差头，NudgeVAD在初始化时等价于冻结规划器，因此学习到的偏差仅来自语言条件残差。我们沿命令可靠性轴评估NudgeVAD。在可靠命令下，语言改进了初始规划器，但与VAD-FT (UNCOND)（一个计算量匹配的、无语言微调的VAD模型）相比几乎冗余。然而，在随机命令下，语言变得至关重要：去除文本使ADE6s降至3.166米，而带有文本的NudgeVAD恢复至2.806米，并优于VAD-FT (UNCOND) 0.312米。这些结果表明，语言并非普遍可加；当分类命令通道不可靠时，它最有价值。

英文摘要

Natural-language instructions promise controllable end-to-end driving, but their benefit can be hidden when planners already receive reliable high-level commands. We propose NudgeVAD, a frozen-planner residual framework that uses language as a calibrated nudge to a VAD trajectory. With identity-initialized FiLM and a zero-initialized residual head, NudgeVAD is equivalent to the frozen planner at initialization, so learned deviations arise only from language-conditioned residuals. We evaluate NudgeVAD along a command-reliability axis. With reliable commands, language improves the initial planner but becomes nearly redundant once compared against VAD-FT (UNCOND), a compute-matched VAD model fine-tuned without language. With random commands, however, language becomes essential: detaching text degrades ADE6s to 3.166 m, while NudgeVAD with text recovers 2.806 m and outperforms VAD-FT (UNCOND) by 0.312 m. These results show that language is not universally additive; it is most valuable when the categorical command channel is unreliable.

URL PDF HTML ☆

赞 0 踩 0