arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.03142 2026-06-03 cs.CV

Disentangling Visual and Factual Correctness in LVLMs' Visualization Literacy

解构LVLMs可视化素养中的视觉与事实正确性

Soohyun Lee, Jaeyoung Kim, Seokhyeon Park, Sihyeon Lee, Jiwon Song, Bohyoung Kim, Hyunjoo Song, Jinwook Seo

AI总结 提出框架分离视觉正确性与事实正确性,通过反事实测试和仲裁指标揭示LVLMs在可视化素养评估中依赖事实记忆而非视觉推理的问题。

详情
Comments
Under review at IEEE Transactions on Visualization and Computer Graphics (TVCG). 23 pages, 9 figures
AI中文摘要

大型视觉语言模型(LVLMs)展现出强大的可视化解释能力,但尚不清楚其响应是否反映对视觉证据的真实推理,还是训练中习得的事实先验。当前评估混合了这两种来源,掩盖了正确视觉解释被记忆事实覆盖的情况。我们提出了一个将视觉正确性与事实正确性分离的框架,揭示了现有可视化素养评估的有效性局限。通过15个最先进LVLMs的三个实验:(1)多个模型在标准测试(VLAT)上达到人类水平,但这可能反映事实回忆而非视觉理解,而随机数据测试(reVLAT)在正确视觉解释被事实先验取代时低估了素养。(2)使用我们的反事实可视化素养评估测试(CVLAT)和能力归一化仲裁指标,我们根据视觉-事实依赖指数(VFRI)的符号对模型进行分类,揭示了以视觉为导向的多数和以事实知识为导向的少数,尽管几个接近零的情况需要谨慎。在相同反事实项目上的人类基线(N=30)证实,人们在冲突时绝大多数遵循图表,提供了人类参考点。(3)基于提示的干预可以改变优先级,但其有效性高度依赖模型且方向不对称,高图表阅读能力不能预测提示可控性。总体而言,高可视化准确性不足以证明忠实的视觉推理:可靠地集成到视觉分析中不仅需要评估可视化素养,还需要评估模型在视觉证据和事实先验分歧时如何仲裁。基准和代码:此 https URL

英文摘要

Large Vision-Language Models (LVLMs) show strong visualization interpretation, yet it is unclear whether their responses reflect genuine reasoning over visual evidence or factual priors learned during training. Current evaluations mix these two sources, obscuring when correct visual interpretation is overridden by memorized facts. We present a framework that isolates visual correctness from factual correctness, revealing validity limitations in existing visualization literacy assessments. Across three experiments with 15 state-of-the-art LVLMs: (1) several models reach human-level performance on standard tests (VLAT), but this may reflect factual recall rather than visual understanding, while randomized-data tests (reVLAT) underestimate literacy when correct visual interpretation is superseded by factual priors. (2) Using our Counterfactual Visualization Literacy Assessment Test (CVLAT) with capability-normalized arbitration metrics, we classify models by the sign of their visual-factual reliance index (VFRI), revealing a visualization-oriented majority and a factual knowledge-oriented minority, though several near-zero cases warrant caution. A human baseline (N=30) on the same counterfactual items confirms that people overwhelmingly follow the chart under conflict, providing a human reference point. (3) Prompt-based intervention can shift prioritization, but its effectiveness is highly model-dependent and direction-asymmetric, and high chart-reading capability does not predict prompt-controllability. Overall, high visualization accuracy is not sufficient evidence of faithful visual reasoning: reliable integration into visual analytics requires evaluating not only visualization literacy but also how models arbitrate between visual evidence and factual priors when the two diverge. Benchmark and code: https://github.com/JaeyoungKim-HCIL/CVLAT

2606.03137 2026-06-03 cs.AI

Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation

Think-Before-Speak: 从内部评估到多智能体社会模拟中的公开表达

Kaiqi Yang, Tai-Quan Peng, Sanguk Lee, Hui Liu

AI总结 提出TBS框架,通过分离智能体的内部推理与公开话语生成,模拟从内部评估到公开表达的路径,并在气候政策讨论中验证其机制敏感性。

详情
AI中文摘要

基于LLM的多智能体模拟为研究社会互动、审议和集体意见动态提供了一种有前景的方法。然而,许多现有的对话模拟框架主要将互动表示为可观察的轮次交换或聚合输出,使得沉默、说话意图和公开表达背后的内部评估过程难以考察。我们引入了TBS(Think-Before-Speak),一种基于间隔的多智能体模拟框架,将智能体的私人推理与公开话语生成分离。在每个间隔,所有智能体基于共享的对话历史及其自身记忆更新结构化的内部状态。这些状态包括与失调相关的评估、感知的意见气候、感知的孤立风险、回应策略和说话意愿。然后,协调器解决竞争的说话意图,并将一个话语提交到公共对话中,允许内部评估和公共互动随时间共同演化。我们在模拟的关于气候相关政策问题的市政厅讨论中评估了TBS。结果表明,TBS产生连贯的内部状态轨迹,并且这些轨迹在轮次分配、沉默和记忆条件下系统地变化。与失调相关的评估增加了智能体的说话意愿,而沉默压力评估则降低了它。一旦形成说话意图,公开表达主要由轮次分配规则塑造。这些发现表明,TBS通过使从内部评估到公开表达的路径可观察和可分析,支持机制敏感的社会模拟。

英文摘要

LLM-based multi-agent simulation offers a promising way to study social interaction, deliberation, and collective opinion dynamics. However, many existing dialogue simulation frameworks represent interaction mainly as observable turn exchange or aggregated outputs, leaving the internal evaluative processes behind silence, speaking intention, and public expression difficult to examine. We introduce TBS (Think-Before-Speak), an interval-based multi-agent simulation framework that separates agents' private reasoning from public utterance generation. At each interval, all agents update structured internal states based on the shared dialogue history and their own memory. These states include dissonance-related appraisal, perceived opinion climate, perceived isolation risk, response strategy, and willingness to speak. The orchestrator then resolves competing speaking intentions and commits one utterance to the public dialogue, allowing internal evaluation and public interaction to co-evolve over time. We evaluate TBS in simulated town hall discussions on a climate-related policy issue. Results show that TBS produces coherent internal-state traces and that these traces vary systematically across turn-allocation, silence, and memory conditions. Dissonance-related appraisal increases agents' willingness to speak, whereas silence-pressure appraisal decreases it. Once speaking intention is formed, public expression is shaped mainly by turn-allocation rules. These findings suggest that TBS supports mechanism-sensitive social simulation by making the pathway from internal evaluation to public expression observable and analyzable.

2606.03136 2026-06-03 cs.CR cs.CL

PsychoPass: Geometric Profiling of Multi-Turn Adversarial LLM Conversations

PsychoPass: 多轮对抗性LLM对话的几何轮廓分析

Muberra Ozmen, Subhabrata Majumdar

AI总结 提出PsychoPass框架,通过提取对话轨迹在嵌入空间中的几何特征,在有害内容生成前预测多轮越狱攻击,并发现早期几何信号具有鲁棒性。

详情
AI中文摘要

对大型语言模型(LLM)的多轮越狱攻击揭示了当前防护措施的不匹配:它们作用于单个轮次,而攻击则作为跨对话的轨迹展开。我们提出从内容转向动态,将对话建模为表示空间中的路径,并询问对抗意图是否在其几何形状中早期编码。我们引入了PsychoPass,一个从嵌入空间中的对话轨迹提取几何特征以在有害内容生成前预测潜在攻击的框架。这些特征在朴素分类器中实现了近乎完美的性能,这很大程度上可以通过包含轮次数作为特征来解释。去除这一混淆因素后,仍保留了一个较小但一致的几何信号,其分类性能不显著依赖于编码器选择。关键的是,该信号在对话早期出现:仅从短前缀开始,攻击结果就高于随机水平,比基线防护更可靠。一项支持性理论分析通过长度与形状的分解、基于前缀长度的检测界限以及编码器不变性解释了这些发现。综合来看,这些结果表明对抗性对话留下了早期、表示鲁棒的几何指纹,适用于在线监控。

英文摘要

Multi-turn jailbreak attacks on large language models (LLMs) reveal a mismatch in current guardrails: they operate on individual turns, while attacks unfold as trajectories across conversations. We propose a shift from content to dynamics, modeling conversations as paths in representation space and asking whether adversarial intent is encoded early in their geometry. We introduce PsychoPass, a framework that extracts geometric features from conversation trajectories in embedding space to predict a potential attack before harmful content is produced. These features achieve near-perfect performance in naïve classifiers, which is largely explained by the inclusion of number of turns as a feature. After removing this confound, a smaller but consistent geometric signal remains, with classification performance that does not depend meaningfully on encoder choice. Crucially, this signal appears early in the conversation: attack outcomes remain above chance from short prefixes alone, more reliably than baseline guardrails. A supporting theoretical analysis explains these findings via a decomposition of length and shape, a detection bound based on prefix length, and encoder invariance. Together, these results show that adversarial conversations leave an early, representation-robust geometric fingerprint suitable for online monitoring.

2606.03135 2026-06-03 cs.AI

Uncertainty-Aware Clarification in LLM Agents with Information Gain

基于信息增益的LLM智能体不确定性感知澄清

Mengyi Deng, Zhiwei Li, Xin Li, Tingyu Zhu, Ying Zhao, Zhijiang Guo, Wei Wang

AI总结 针对用户指令不明确导致LLM智能体工具操作错误的问题,提出一种以信息增益奖励为导向的澄清框架,通过贝叶斯信念更新量化澄清问题的效用,训练智能体生成高信息增益的澄清,在τ-Bench环境中将任务成功率提升3.7%,仅增加0.3个交互步骤。

详情
Journal ref
ICML 2026
AI中文摘要

大型语言模型(LLM)智能体通常在未明确说明的用户指令下运行,其中关于用户意图的潜在不确定性会导致错误的工具操作。为了解决这一挑战,我们提出了一种目标导向的澄清框架,将澄清行为与歧义消除对齐。我们方法的核心是信息增益奖励,这是一种通过测量由澄清交互引起的对真实目标贝叶斯信念更新来量化澄清问题效用的指标。我们使用该奖励训练澄清器(LLM),以优化高信息增益,确保澄清有效减少不确定性并提高智能体-工具-用户环境中的任务完成度。我们在一个增强澄清的τ-Bench环境中验证了我们的框架,并在五个异质骨干网络上进行了跨智能体评估。实验结果表明,与无澄清基线相比,我们的方法一致地将成功率提高了3.7%,同时平均仅增加了0.3个总交互步骤。

英文摘要

Large Language Model (LLM) agents often operate under underspecified user instructions, where latent uncertainty over user intent leads to erroneous tool actions. To address this challenge, we propose a goal-oriented clarification framework that aligns clarification behavior with ambiguity resolution. Central to our approach is the Information Gain Reward, a metric that quantifies the utility of clarification questions by measuring the Bayesian belief update towards the ground-truth goal induced by the clarification exchange. We train the clarifier (LLM) using this reward to optimize for high information gain, ensuring that clarifications effectively reduce uncertainty and improve task completion within the agent-tool-user environment. We validate our framework within a clarification-enhanced $τ$-Bench environment, conducting cross-agent evaluations across five heterogeneous backbones. Empirical results demonstrate that our method consistently improves the success rate by 3.7\% over the no-clarification baseline, while adding only 0.3 total interaction steps on average.

2606.03134 2026-06-03 cs.RO cs.LG

How Visible Are Silent Manipulation Failures? An Observability Study of False-Success Detection in Simulated Robot Episodes

无声操作失败的可见性:模拟机器人任务中假成功检测的可观测性研究

Aarav Bedi

AI总结 本研究通过模拟双机械臂ALOHA任务,探讨机器人自身成功检测器标记为成功的任务中,假成功(实际失败但被误判为成功)的可恢复性,发现基于关节数据的检测器在方块转移任务中几乎完全可恢复假成功,而在插销任务中仅部分可恢复,视觉检测器可弥补差距,且可分离性依赖于远低于实际传感器噪声的速度差异。

详情
Comments
4 pages, 3 figures
AI中文摘要

模仿学习策略用于机器人操作时,其训练任务的成功标签质量取决于机器人自身的成功检测器。一种特别有害的错误是假成功:机器人记录为成功但实际任务结果错误的任务。我们针对这些任务提出一个狭窄但实际的问题:一旦任务被标记为成功,推翻该标签所需的信息有多少存在于本体感觉中,又有多少需要视觉?我们在两个双机械臂ALOHA任务上构建模拟测试平台,通过环境扰动而非标签编辑诱发失败,利用检测器从未见过的特权模拟器状态标记每个任务,仅保留机器人标记为成功的任务。然后,我们将限制于本体感觉的检测器与基于视觉的检测器进行比较。我们发现可恢复性范围广泛:在方块转移任务中,假成功几乎完全可从关节数据中恢复,而在插销插入任务中,本体感觉仅恢复部分假成功,视觉检测器则弥补了大部分差距。我们还表明,我们测量的本体感觉可分离性依赖于远低于任何实际传感器噪声水平的速度差异,因此最好将其视为无噪声模拟器夸大的乐观上限。我们发布了生成和评估流程。

英文摘要

Imitation-learning policies for robot manipulation inherit the quality of the success labels attached to their training episodes, and those labels are usually produced by the robot's own success check. A particularly damaging error is the false success: an episode the robot logs as a success when the task outcome was actually wrong. We ask a narrow but practical question about these episodes. Once an episode has already been flagged as a success, how much of the information needed to overturn that label is present in proprioception, and how much requires vision? We build a simulated testbed on two bimanual ALOHA tasks, induce failures through environment perturbations rather than label edits, label every episode by privileged simulator state that the detector never sees, and keep only episodes the robot flagged as successful. We then compare detectors restricted to proprioception against a vision-based detector. We find that recoverability spans a wide range: in cube transfer the false successes are almost fully recoverable from joint data alone, while in peg insertion proprioception recovers only part of them and a vision detector closes most of the gap. We also show that the proprioceptive separability we measure rests on velocity differences far below any realistic sensor noise floor, so it is best read as an optimistic upper bound that a noiseless simulator inflates. We release the generation and evaluation pipeline.

2606.03132 2026-06-03 cs.CL

DMT-CBT: Longitudinal Therapeutic State Modeling for CBT Counseling

DMT-CBT:面向CBT咨询的纵向治疗状态建模

Chang Liu, Shuyi Zhang, Changsheng Ma, Yongfeng Tao, Minqiang Yang, Bin Hu

AI总结 提出DMT-CBT框架,通过跨会话结构化治疗状态、多模态行为基础与工具增强干预,解决现有方法将CBT咨询简化为局部回复生成的问题,实现纵向治疗状态动态建模。

详情
AI中文摘要

大型语言模型(LLM)在认知行为疗法(CBT)咨询中展现出日益增长的潜力。然而,现有方法大多将咨询视为局部回复生成问题,专注于短文本、仅文本或单次会话交互中的共情回复。我们认为这种表述从根本上与真实心理治疗的本质不符。在临床CBT中,治疗是一个纵向过程,治疗师需要跨会话持续推断、更新和干预不断演变的治疗状态。真实的CBT还涉及多模态推理和延迟的跨会话干预效果,要求模型在部分可观测性下捕捉纵向治疗状态的演变。我们提出DMT-CBT,一个用于CBT咨询中治疗状态动态建模的框架。DMT-CBT跨会话维护结构化的治疗状态,同时整合多模态行为基础和工具增强的干预,以支持适应性治疗推理。基于该框架,我们构建了DMTCorpus,一个合成的多会话多模态CBT咨询数据集,具有演变的治疗状态、图像基础的患者行为以及跨会话干预的连续性。实验结果表明,与事后提取方法相比,DMT-CBT提高了咨询保真度和治疗联盟,产生了更有利的纵向情感轨迹,并更忠实地保留了治疗状态。

英文摘要

Large language models (LLMs) have shown growing potential for Cognitive Behavioral Therapy (CBT) counseling. However, most existing approaches still formulate counseling as a local response generation problem, focusing on empathetic replies within short, text-only, or single-session interactions. We argue that this formulation fundamentally mismatches the nature of real psychotherapy. In clinical CBT, therapy is a longitudinal process in which therapists continuously infer, update, and intervene on evolving therapeutic states across sessions. Realistic CBT further involves multimodal inference and delayed cross-session intervention effects, requiring models to capture longitudinal therapeutic state evolution under partial observability. We propose DMT-CBT, a framework for Dynamic Modeling of evolving Therapeutic states in CBT counseling. DMT-CBT maintains structured therapeutic states across sessions while incorporating multimodal behavioral grounding and tool-augmented intervention to support adaptive therapeutic reasoning. Based on this framework, we construct DMTCorpus, a synthetic multi-session multimodal CBT counseling dataset featuring evolving therapeutic states, image-grounded client behaviors, and cross-session intervention continuity. Experimental results show that DMT-CBT improves counseling fidelity and therapeutic alliance, produces more favorable longitudinal affective trajectories, and preserves therapeutic states more faithfully than post-hoc extraction approaches.

2606.03131 2026-06-03 cs.LG

HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models

HARVE:面向鲁棒奖励模型的感知黑客奖励头向量编辑

Shuang Liu, Yuxuan Bo, Qiuyang Zhao, Caiyue Huang, Xiaorong Chen, Yanguang Liu, Mengnan Du

AI总结 针对奖励模型易受奖励黑客攻击的问题,提出无需训练的奖励头编辑方法HARVE,通过移除与黑客相关子空间对齐的奖励头向量分量,提升鲁棒性并保持通用能力。

详情
AI中文摘要

奖励模型对于大型语言模型(LLM)对齐至关重要,但它们仍然容易受到奖励黑客攻击。为了评估奖励模型的鲁棒性,我们引入了RewardHackBench,其中包含13种奖励黑客模式,涵盖现实生活中的高风险领域和通用设置,并且我们发现八个奖励模型在特定子类别上存在严重失败。为了缓解这些失败,我们提出了HARVE,一种针对标量奖励模型的无需训练的奖励头编辑方法。HARVE不是微调奖励模型,而是从与选定黑客子类别相关的残差流方向中识别出多方向黑客子空间,并移除与该子空间对齐的奖励头向量分量。这直接降低了奖励头对黑客相关特征的敏感性,仅使用少量对比性的黄金-黑客示例,无需梯度更新或微调。在八个奖励模型上的综合实验表明,该方法提高了黑客鲁棒性,优于微调基线,并保持了奖励模型的通用能力。进一步的分析表明,奖励黑客攻击更适合被捕捉为多维残差空间结构,而不是孤立的表面线索。

英文摘要

Reward models are central to large language model (LLM) alignment, but they remain vulnerable to reward hacking. To evaluate reward-model robustness, we introduce RewardHackBench containing 13 reward-hacking patterns covering real life high-stakes domains and general settings, and we find severe failures on specific subcategories across eight reward models. To mitigate these failures, we propose HARVE, a training-free reward-head editing method for scalar reward models. Instead of fine-tuning the reward model, HARVE identifies a multi-directional hacking subspace from residual stream directions associated with selected hacking subcategories, and removes the component of the reward-head vector aligned with that subspace. This directly reduces the reward head's sensitivity to hacking-related features using only a small set of contrastive gold-hacked examples, without gradient updates or fine-tuning. Comprehensive experiments across eight reward models indicates that \model improves hacking robustness, outperforms fine-tuning baselines, and preserves reward-models' general capability. Further analyses suggest that reward hacking is better captured as a multidimensional residual-space structure than by isolated surface cues.

2606.03130 2026-06-03 cs.LG

Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation

合成幻觉,真实收益:来自前沿模型的硬负样本用于FIM幻觉缓解

Mahdi Erfanian, Nelson Daniel Troncoso, Aashna Garg, Amabel Gale, Xiaoyu Liu, Pareesa Ameneh Golnari, Shengyu Fu

AI总结 针对小型开源代码模型在IDE自动补全中产生的填充中间(FIM)幻觉问题,提出一种无需执行的替代方法:利用前沿代码模型合成看似合理但错误的补全作为硬负样本,通过对比合成幻觉与真实开发者编辑的差异作为监督微调信号,在Delulu基准上提升精确匹配18.8个百分点。

详情
AI中文摘要

驱动IDE自动补全的小型开源代码模型仍然会输出幻觉的填充中间(FIM)补全:对项目中不存在的方法、参数、变量和导入的语法上自然的调用。现有的缓解方法要么需要每种语言的执行沙箱(在按键中途不适用),要么需要偏好优化管道(需要大量人工标注语料库)。我们提出一种无需执行的替代方案:使用前沿代码模型合成看似合理但错误的补全作为硬负样本,然后利用这些合成幻觉与真实开发者编辑之间的对比作为监督微调信号。我们的管道从公共GitHub中跨八种语言抓取多语言FIM上下文,并让一组三个前沿生成器为每个上下文针对Delulu分类法(一个经Docker验证的多语言FIM幻觉基准)中的四种幻觉类型各生成一个硬负样本,从而产生配对的选定/拒绝数据集。在10万行精选子集上微调Qwen2.5-Coder-7B-Instruct,使Delulu精确匹配提升+18.8点,编辑相似度提升+0.22,覆盖每种语言和每种类型,同时改进每个HumanEval-Infilling分割和每个SAFIM子集。同样的配方在3B模型上使Delulu提升+12.8 EM,并带有小的、特征化的一般FIM权衡。五轴消融实验(规模、类型混合、语言覆盖、基础模型家族和难度感知的愚弄率)加上头对头的SFT与DPO/ORPO比较,映射了哪些设计选择驱动了收益。我们发布完整的管道源代码——生成、愚弄率LLM评判、筛选和FIM微调配方——以便本文中的实验可以在任何许可语料库上端到端复现。

英文摘要

Small open-source code models that power IDE autocomplete still emit hallucinated Fill-in-the-Middle (FIM) completions: syntactically natural calls to methods, parameters, variables, and imports that do not exist in the surrounding project. Existing mitigations either require per-language execution sandboxes that do not apply at mid-keystroke or preference-optimisation pipelines that need large human-labelled corpora. We propose an execution-free alternative: use frontier code models to synthesise plausible-but-wrong completions as hard negatives, then leverage the contrast between these synthetic hallucinations and the ground-truth developer edit as a supervised fine-tuning signal. Our pipeline scrapes multilingual FIM contexts from public GitHub across eight languages and asks a panel of three frontier generators to produce one hard negative per context for each of four hallucination types drawn from the Delulu taxonomy, a Docker-verified multilingual FIM hallucination benchmark, yielding a paired chosen/rejected dataset. Fine-tuning Qwen2.5-Coder-7B-Instruct on a 100K-row curated subset lifts Delulu exact match by +18.8 points and edit similarity by +0.22 on every language and every type, while also improving every HumanEval-Infilling split and every SAFIM subset. The same recipe at 3B lifts Delulu by +12.8 EM with a small, characterised general-FIM trade-off. Five-axis ablations (size, type mix, language coverage, base-model family, and a difficulty-aware fool rate) plus a head-to-head SFT vs. DPO/ORPO comparison map which design choices drive the gain. We release the full pipeline source code -- generation, fool-rate LLM judging, curation, and the FIM fine-tuning recipe -- so that the experiments in this paper can be reproduced end-to end on any permissively licensed corpus.

2606.03128 2026-06-03 cs.CR cs.AI cs.CL cs.LG

Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation

解耦式智能合约审计:通过蒸馏与聚合的轻量级LLM框架

Bagus Rakadyanto Oktavianto Putra, Muhamad Risqi Utama Saputra, Widyawan, Guntur Dharma Putra

AI总结 提出一种基于轻量级开源LLM(0.6B-4B参数)的解耦式智能合约审计框架,通过rsLoRA、知识蒸馏和链式验证聚合策略,在漏洞检测中达到98.25%准确率,优于7B-34B参数模型。

详情
Comments
12 pages, 4 figures, 5 tables. Accepted to IEEE ICWS 2026
AI中文摘要

智能合约面临关键安全挑战,需要在去中心化网络服务中进行彻底审计。虽然大型语言模型(LLMs)在自动漏洞检测中展现出潜力,但现有方法缺乏严重性评估和可操作的修复建议,且计算开销过大。在本研究中,我们引入了一个高效的端到端智能合约安全审计框架,利用轻量级、高度优化的开源LLMs(0.6B-4B参数)。我们的框架将综合审计任务解耦为四个相互关联的组件:漏洞检测、解释、严重性分类和修复建议。为了在无需庞大参数量的情况下保持高准确性,我们实现了秩稳定低秩适配器(rsLoRA)、知识蒸馏以及自定义链式验证(CoVe)聚合策略,系统性地筛选并整合模型生成的多个草稿响应,形成高准确度的审计报告。实验结果表明,我们的轻量级流水线持续优于最先进的开源代码密集LLMs(7B至34B参数),在漏洞检测中达到98.25%的准确率,在生成解释任务中达到0.4375的对齐分数。此外,我们广泛的消融研究实证验证了我们的解耦审计过程相对于统一提示的优越性,并揭示了一种新颖的严重性中心性偏差,为未来LLM辅助审计研究建立了关键基准。

英文摘要

Smart contracts face critical security challenges that require thorough auditing in decentralized web services. While Large Language Models (LLMs) have shown promise in automated vulnerability detection, existing approaches lack severity evaluations with actionable remediation and demand unnecessarily massive computational overhead. In this study, we introduce an efficient end-to-end smart contract security audit framework utilizing lightweight, highly optimized open-source LLMs (0.6B-4B parameters). Our framework decouples comprehensive audit tasks into four interconnected components: vulnerability detection, explanation, severity classification, and remediation recommendation. To maintain high accuracy without massive parameters, we implement Rank-Stabilized Low-Rank Adapters (rsLoRA), knowledge distillation, and a custom Chain-of-Verification (CoVe) aggregation strategy to systematically screen and consolidate multiple draft responses from the model into a highly accurate audit report. Experimental results demonstrate that our lightweight pipeline consistently outperforms state-of-the-art open-source coder dense LLMs (7B to 34B parameters), achieving 98.25% accuracy in vulnerability detection and an alignment score of 0.4375 in generative explanation tasks. Furthermore, our extensive ablation studies empirically validate the superiority of our decoupled audit processes over unified prompting and uncover a novel severity centrality bias, establishing a critical benchmark for future research in LLM-assisted auditing.

2606.03127 2026-06-03 cs.RO

TTT-VLA: Test-Time Latent Prompt Optimization for Vision-Language-Action Models

TTT-VLA:面向视觉-语言-动作模型的测试时潜在提示优化

Wenbo Zhang, Jianxiong Li, Shuai Yang, Sijin Chen, Jiajun Liu, Lingqiao Liu, Xiao Ma

AI总结 提出TTT-VLA框架,通过测试时优化潜在提示来适应分布偏移,无需修改策略本身,在SimperEnv上提升单/多实体任务成功率。

详情
AI中文摘要

基于大规模数据训练的视觉-语言-动作(VLA)模型取得了显著进展,但在部署时仍易受分布偏移影响。最近的VLA模型表明,提示可以作为引导策略行为的有效接口,但现有的基于提示的引导通常依赖外部指导。这自然引出一个问题:能否通过优化提示来实现VLA的测试时训练(TTT),使得引导接口本身可以从交互中学习和适应?我们通过TTT-VLA来解决这个问题,这是一种基于潜在提示优化(LPO)的测试时训练框架。在训练期间,潜在提示通过额外的代理任务学习,为策略学习提供额外的学习条件信号。在测试时,通过从当前环境收集交互数据,并仅使用代理任务的自监督信号优化这些数据上的潜在提示来执行TTT,而不修改策略本身。在SimperEnv上的实验表明,所提方法在单实体和多实体设置中均能持续提高任务成功率。进一步分析表明,提升主要源于纠正少量关键决策,而非全局改变策略行为。这些结果表明,LPO为基础操作策略的部署时改进提供了一条有效且实用的途径。

英文摘要

Vision-Language-Action (VLA) models trained on large-scale data have made remarkable progress, but they remain vulnerable to distribution shifts at deployment time. Recent VLA models suggest that prompts can serve as an efficient interface for steering policy behavior, but existing prompt-based steering typically relies on external guidance. This raises a natural question: can test-time training (TTT) for VLA be achieved by optimizing a prompt, so that the steering interface itself can be learned and adapted from interaction? We address this question with TTT-VLA, a test-time training framework based on Latent Prompt Optimization (LPO). During training, the latent prompt is learned with an additional proxy task, providing an extra learned conditioning signal for policy learning. At test time, TTT is performed by collecting interaction data from the current environment and optimizing only the latent prompt on those data using the proxy task's self-supervised signal, without modifying the policy itself. Experiments on SimplerEnv demonstrate that the proposed method consistently improves task success rates in both single- and multi-embodiment settings. Further analysis shows that the gains arise primarily from correcting a small number of critical decisions rather than globally altering policy behavior. These results suggest that LPO provides an effective and practical pathway for deployment-time improvement of foundation manipulation policies.

2606.03125 2026-06-03 cs.LG

Rethinking Neural Width for Alternating Current Optimal Power Flow Proxies

重新思考用于交流最优潮流代理的神经网络宽度

Dhruvi Khandelwal, Anurag Basistha, Ayushi Jolotia, Parikshit Pareek

AI总结 本文提出损失引导神经稠密化算法,通过逐步扩展网络容量来最小化宽度,以精确逼近交流最优潮流流形,并在多个IEEE系统上以少十倍的神经元达到与基线相当的性能。

详情
AI中文摘要

用于交流最优潮流(ACOPF)的深度学习代理缺乏确定架构大小的系统方法。本文通过一个建设性的思想实验来回答一个基本问题:神经网络必须有多宽才能几乎准确地逼近ACOPF流形?我们引入了一种损失引导神经稠密化(LG-ND)算法,该算法仅在当前深度神经网络拓扑无法进一步改进时进行扩展,从而逐步发现必要的容量。在多个IEEE系统上的实验结果表明,LG-ND使用每层最多少十倍的神经元即可达到与文献基线相当的性能。这种架构极简性对于安全关键电网运行中所需的正式验证至关重要。

英文摘要

Deep learning proxies for Alternating Current Optimal Power Flow (ACOPF) lack systematic methods for determining architectural size. This paper conducts a constructive thought experiment to answer a fundamental inquiry: how wide must a neural network be to almost accurately approximate the ACOPF manifold? We introduce a Loss-Guided Neural Densification (LG-ND) algorithm that incrementally discovers necessary capacity by expanding only when the current deep neural network topology fails to improve further. Empirical results across various IEEE systems show that LG-ND achieves performance parity with literature baselines using up to ten times fewer neurons per layer. Such architectural minimalism is critical for the formal verification required in safety-critical grid operations.

2606.03121 2026-06-03 cs.LG

TiWeaver: Unified Temporal Dynamics Modeling via Contextual Patching

TiWeaver:通过上下文补丁实现统一的时间动态建模

Zhe Li, Jindong Tian, Hao Miao, Zhi Lei, Chenjuan Guo, Bin Yang

AI总结 针对多变量时间序列中因缺失值和非均匀采样等不规则性导致的动态复杂性和通道间异步依赖问题,提出TiWeaver框架,通过图引导自适应分词器(G²AT)和细粒度异步依赖提取器(FADE)实现自适应建模,在12个数据集上取得最高25%的性能提升。

详情
AI中文摘要

多变量时间序列预测在现实世界应用中扮演着关键角色,包括天气预报、股票分析和健康监测。由于数据源的多样性,时间序列表现出多样的时间动态,通常伴随着各种不规则性,如缺失值和非均匀采样频率。这些不规则性导致跨通道的复杂异步时间依赖。因此,具有固定补丁方案的单一模型往往难以很好地适应多样化的多变量时间序列,阻碍了准确预测。在本文中,我们提出了TiWeaver,一个统一框架,旨在自适应地处理时间动态和细粒度的通道间依赖。具体来说,我们引入了一个图引导自适应分词器(G²AT),通过联合考虑时间密度和表示一致性,将时间序列划分为高度上下文连贯的补丁。此外,我们提出了一个细粒度异步依赖提取器(FADE),旨在建模细粒度的异步通道间依赖,同时结合长期历史依赖。我们在12个真实世界时间序列数据集上评估了TiWeaver,它取得了最先进的性能,优于现有方法高达25%。这些结果证明了其在多样化领域和数据特征上的鲁棒性和有效性。

英文摘要

Multivariate time series forecasting plays a critical role in real-world applications, including weather prediction, stock analysis, and health monitoring. Due to the diversity of data sources, time series exhibit diverse temporal dynamics, often accompanied by various irregularities such as missing values and non-uniform sampling frequencies. Such irregularities lead to complex and asynchronous temporal dependencies across channels. Thus, a single model with a fixed patching scheme often fails to adapt well to diverse multivariate time series, hindering accurate forecasting. In this paper, we propose TiWeaver, a unified framework designed to handle temporal dynamics and fine-grained inter-channel dependencies adaptively. Specifically, we introduce a Graph-Guided Adaptive Tokenizer (G$^2$AT) that divides time series into high contextually coherent patches by jointly considering temporal density and representation consistency. In addition, we propose a Fine-grained Asynchronous Dependency Extractor (FADE), which is designed to model fine-grained asynchronous inter-channel dependencies while incorporating long-term historical dependencies. We evaluate TiWeaver on 12 real-world time series datasets, where it achieves state-of-the-art performance, outperforming existing methods up to 25%. These results demonstrate its robustness and effectiveness across diverse domains and data characteristics.

2606.03120 2026-06-03 cs.CV

KC-3DGS: Kurtosis-Constrained Gaussian Splatting for High-Fidelity View Synthesis

KC-3DGS: 基于峰度约束的高斯泼溅用于高保真视图合成

Vivekjyoti Banerjee, Abhay Yadav, Rama Chellappa, Aniket Roy

AI总结 提出KC-3DGS,通过在小波域添加多尺度对齐损失、峰度集中损失和跨频带协方差惩罚,增强3DGS的感知质量,尤其改善稀疏视图下的高频细节和结构伪影。

详情
AI中文摘要

3D高斯泼溅(3DGS)通过将场景表示为各向异性高斯集合,并通过可微分光栅化优化,实现了实时新视图合成。然而,标准像素空间损失(L1、SSIM)仅约束整体重建误差,允许优化在频率尺度上重新分配误差。这导致过度平滑和结构伪影,尤其在监督有限的稀疏视图设置中。我们提出KC-3DGS,通过基于自然图像统计的小波域监督来增强3DGS训练。我们的方法结合了三个组件:(1)多尺度小波系数对齐损失,显式惩罚缺失的高频细节;(2)有监督的峰度集中损失,鼓励渲染图像匹配真实图像的重尾频率统计;(3)跨频带协方差惩罚,促进频率专门化。我们提供理论分析,表明像素空间损失允许在小波重分布下的一族不可区分扰动,而我们的联合目标排除了退化解。在MipNeRF360、Tanks&Temples、MVImgNet、DeepBlending和WRIVA-ULTRRA上的实验表明,感知质量持续提升。在具有挑战性的WRIVA-ULTRRA室外数据集上,KC-3DGS在DreamSim上提高了9.48%,同时改善了PSNR、SSIM和LPIPS。在仅有12张训练图像的稀疏视图设置中,我们的方法在MipNeRF360上将PSNR提高了高达0.5 dB,同时保持了感知质量。该方法作为即插即用的正则化策略,可无缝集成到现有的3DGS流程中。

英文摘要

3D Gaussian Splatting (3DGS) enables real-time novel view synthesis by representing scenes as collections of anisotropic Gaussians optimized via differentiable rasterization. However, standard pixel-space losses (L1, SSIM) constrain only aggregate reconstruction error, permitting the optimization to redistribute error across frequency scales. This leads to oversmoothing and structural artifacts, particularly in sparse-view settings where supervision is limited. We propose KC-3DGS, which augments 3DGS training with wavelet-domain supervision based on natural image statistics. Our method combines three components: (1) a multi-scale wavelet coefficient alignment loss that explicitly penalizes missing high-frequency detail, (2) a supervised kurtosis concentration loss that encourages rendered images to match the heavy-tailed frequency statistics of ground-truth images, and (3) a cross-band covariance penalty that promotes frequency specialization. We provide theoretical analysis showing that pixel-space losses admit a family of indistinguishable perturbations under wavelet redistribution, and that our joint objective excludes degenerate solutions. Experiments across MipNeRF360, Tanks&Temples, MVImgNet, DeepBlending, and WRIVA-ULTRRA demonstrate consistent improvements in perceptual quality. On the challenging WRIVA-ULTRRA outdoor dataset, KC-3DGS achieves a 9.48% improvement in DreamSim while also improving PSNR, SSIM, and LPIPS. In sparse-view settings with only 12 training images, our method improves PSNR by up to 0.5 dB on MipNeRF360 while maintaining perceptual quality. The approach integrates seamlessly into existing 3DGS pipelines as a plug-and-play regularization strategy.

2606.03119 2026-06-03 cs.CV cs.AI cs.LG

GuidedBridge: Training-freely Improving Bridge Models with Prior Guidance

GuidedBridge: 无需训练地利用先验引导改进桥接模型

Zehua Chen, Yucheng Yang, Binjie Yuan, Kaiwen Zheng, Jun S. Liu, Jun Zhu

AI总结 提出无需训练的先验引导方法(PG)和频率调制先验引导(FMPG),通过对比弱先验与已见先验增强桥接模型的先验利用,并设计级联框架CFG-FMPG用于图像修复,实验证明该方法能一致提升预训练桥接模型在多种图像翻译任务中的性能。

详情
Comments
ICML 2026
AI中文摘要

引导方法,如无分类器引导(CFG)和自动引导(AG),推动了扩散模型中噪声到数据生成的发展。最近,桥接模型引入了一种数据到数据的生成过程,可以利用有指导性的干净先验。在这项工作中,受先前通过去噪结果质量差异作为引导的方法启发,我们提出了一种无需训练的桥接引导方法,称为先验引导(PG)。具体来说,我们引入一个弱先验,该先验在桥接预训练期间未见,阻碍先验利用从而降低去噪结果。然后,我们将其与已见先验对比,通过缩放因子突出并增强先验利用。此外,我们分析了桥接过程中先验利用的潜在机制,并设计了频率调制先验引导(FMPG),该引导将引导尺度调整到与桥接生成动力学一致的低频和高频带。为了解决图像修复中的先验利用问题,我们开发了一个级联框架CFG-FMPG,该框架首先通过CFG生成噪声隐藏表示,然后将其作为生成先验与FMPG一起利用,在不影响推理效率的情况下发挥它们的互补优势。实验表明,我们的PG方法在多种图像翻译任务中一致地改进了预训练桥接模型。

英文摘要

Guidance methods, such as classifier-free guidance (CFG) and auto-guidance (AG), have advanced noise-to-data generation in diffusion models. Recently, bridge models have introduced a data-to-data generative process that can exploit an instructive clean prior. In this work, inspired by previous methods creating quality difference between denoising results as guidance, we propose a training-free bridge guidance method, termed Prior Guidance (PG). Specifically, we introduce a weak prior, which is unseen during bridge pre-training, hindering prior exploitation and thereby degrading denoising result. Then, we contrast it with the seen prior to highlight and enhance prior exploitation via a scaling factor. Moreover, we analyze the underlying mechanism of prior exploitation in the bridge process and design frequency-modulated prior guidance (FMPG), which tailors the guidance scale to low- and high-frequency bands coherent with bridge generative dynamics. To address prior exploitation in image in-painting, we develop a cascaded framework, CFG-FMPG, which first generates a noisy hidden representation via CFG and then exploits it as a generative prior with FMPG, fulfilling their complementary strengths without compromising inference efficiency. Experiments demonstrate that our PG methods consistently improve pre-trained bridge models across diverse image translation tasks.

2606.03118 2026-06-03 cs.LG cs.CV q-bio.NC

Learning to See via Epiretinal Implant Stimulation in silico with Model-Based Deep Reinforcement Learning

通过基于模型的深度强化学习在硅上学习经由视网膜上植入物刺激的视觉

Jacob Lavoie, Marwan Besrour, William Lemaire, Jean Rouat, Réjean Fontaine, Eric Plourde

AI总结 本研究提出使用各向同性和各向异性形状,通过深度强化学习在虚拟患者的视网膜上渲染可理解的图像,以提高人工恢复视觉的清晰度。

详情
Journal ref
Biomed. Phys. Eng. Express 10 (2024) 025006
Comments
18 pages, 6 figures. Published version: Biomed. Phys. Eng. Express 10, 025006 (2024)
AI中文摘要

目标:年龄相关性黄斑变性和视网膜色素变性等疾病会导致感光层退化。恢复视力的一种方法是通过微电极阵列(如视网膜上植入物)电刺激存活的视网膜神经节细胞。已知视网膜上植入物会产生沿邻近视网膜神经节细胞轴突束延伸的可见各向异性形状。最近的研究表明,为了获得各向同性的像素状形状,可以通过失活电极或降低刺激电流水平来映射轴突束并避免刺激它们。避免轴突束刺激旨在去除类似笔触的形状,转而采用更简化的像素状形状集合。方法:在本研究中,我们提出使用各向同性和各向异性形状,在名为rlretina的强化学习环境中为虚拟患者的视网膜渲染可理解的图像。该环境将任务形式化为在基于笔触的渲染任务中使用笔触。主要结果:我们训练了一个深度强化学习智能体,它学会组合各向同性和各向异性形状以形成图像。我们研究了哪种基于误差或基于感知的指标适合奖励智能体。该智能体以基于模型的数据生成方式训练,使用经过心理物理学验证的轴突映射模型来渲染不同虚拟患者感知到的图像。我们表明,与不同虚拟患者中的朴素方法相比,该智能体可以生成更可理解的图像。意义:这项工作提供了一种解决视网膜上刺激的新方法,这是朝着使用各向异性光幻视改善人工恢复视力中视觉敏锐度的第一步。

英文摘要

Objective: Diseases such as age-related macular degeneration and retinitis pigmentosa cause the degradation of the photoreceptor layer. One approach to restore vision is to electrically stimulate the surviving retinal ganglion cells with a microelectrode array such as epiretinal implants. Epiretinal implants are known to generate visible anisotropic shapes elongated along the axon fascicles of neighboring retinal ganglion cells. Recent work has demonstrated that to obtain isotropic pixel-like shapes, it is possible to map axon fascicles and avoid stimulating them by inactivating electrodes or lowering stimulation current levels. Avoiding axon fascicle stimulation aims to remove brushstroke-like shapes in favor of a more reduced set of pixel-like shapes. Approach: In this study, we propose the use of isotropic and anisotropic shapes to render intelligible images on the retina of a virtual patient in a reinforcement learning environment named rlretina. The environment formalizes the task as using brushstrokes in a stroke-based rendering task. Main Results: We train a deep reinforcement learning agent that learns to assemble isotropic and anisotropic shapes to form an image. We investigate which error-based or perception-based metrics is adequate to reward the agent. The agent is trained in a model-based data generation fashion using the psychophysically validated axon map model to render images as perceived by different virtual patients. We show that the agent can generate more intelligible images compared to the naive method in different virtual patients. Significance: This work shares a new way to address epiretinal stimulation that constitutes a first step towards improving visual acuity in artificially-restored vision using anisotropic phosphenes.

2606.03114 2026-06-03 cs.CV

FAF-CD: Frequency-Aware Fusion for Change Detection under Imperfect Multimodal Remote Sensing

FAF-CD: 面向不完美多模态遥感的频率感知融合变化检测

Yufan Wang, Sokratis Makrogiannis, Chandra Kambhamettu

AI总结 提出频率感知混合框架FAF-CD,通过DINOv3预训练ConvNeXt编码器、VMamba解码器及修正感知三支融合模块(可变形空间对齐+傅里叶/哈尔小波比较+自适应门控),在不完美异质遥感(如EO-SAR)和二元光学变化检测中提升精度并降低计算成本。

详情
Comments
Code will be released at https://github.com/VimsLab/FAF-CD
AI中文摘要

面向真实世界监测的遥感变化检测通常依赖于不完美的异质观测,其中事件前后图像可能异步、跨传感器,或受光照、季节和模态偏移影响。这一设置对EO-SAR灾害制图尤其具有挑战性,因为干扰变化可能类似于结构损伤。我们提出FAF-CD,一种频率感知混合框架,采用DINOv3预训练的ConvNeXt编码器和线性复杂度的基于VMamba的解码器。其修正感知三支融合模块将可变形空间对齐与傅里叶和哈尔小波比较相结合,使用自适应门控跨尺度聚合互补线索。在BRIGHT验证集上,匹配的异质EO-SAR适应在干净和扰动tc-mIoU/tc-mAP上优于NeXt2Former-CD。FAF-CD还泛化到二元光学变化检测,在LEVIR-CD上达到0.924 cF1,在WHU-CD上达到0.955 cF1,并在伪变化对齐压力测试下,在M-CD和NeXt2Former-CD中,在两个二元数据集上获得最佳平均扰动cIoU/cF1。相对于NeXt2Former-CD,它进一步降低了约24 GFLOPs的计算成本,同时保持或提高了精度。

英文摘要

Remote sensing change detection for real-world monitoring often relies on imperfect heterogeneous observations, where pre- and post-event images may be asynchronous, cross-sensor, or affected by illumination, seasonal, and modality shifts. This setting is especially challenging for EO-SAR disaster mapping, where nuisance variation can resemble structural damage. We propose FAF-CD, a frequency-aware hybrid framework with a DINOv3-pretrained ConvNeXt encoder and a linear-complexity VMamba-based decoder. Its rectification-aware tri-branch fusion module combines deformable spatial alignment with Fourier and Haar-wavelet comparisons, using adaptive gating to aggregate complementary cues across scales. On BRIGHT validation, a matched heterogeneous EO-SAR adaptation improves clean and perturbed tc-mIoU/tc-mAP over NeXt2Former-CD. FAF-CD also generalizes to binary optical CD, achieving 0.924 cF1 on LEVIR-CD and 0.955 cF1 on WHU-CD, and obtains the best average perturbed cIoU/cF1 on both binary datasets among M-CD and NeXt2Former-CD under pseudo-change-aligned stress tests. It further reduces cost by approximately 24 GFLOPs relative to NeXt2Former-CD while maintaining or improving accuracy.

2606.03113 2026-06-03 cs.CL

Experience-Driven Dynamic Exits for LLMs with Reinforcement Learning

基于强化学习的经验驱动动态退出机制用于大语言模型

Yanyu Zhu, Hoilam Pao, Niu Hu, Wei Guo, Shaoxiong Zhan, Boyu Lai, Zitai Wang, Yongqin Zeng, Hai-Tao Zheng

AI总结 针对大语言模型自回归推理慢的问题,提出LEDE框架,利用离线强化学习动态选择最优退出层和推测长度,实现2.0-2.7倍加速,并比静态推测基线额外提升17%速度。

详情
AI中文摘要

大语言模型遭受缓慢的自回归推理。虽然自我推测解码加速了这一过程,但其效率受到静态配置(如固定退出层和推测长度)的阻碍。我们将此优化重新构建为马尔可夫决策过程,并提出LEDE,一个使用离线强化学习的框架。LEDE学习一个策略,根据每一步生成序列的局部上下文动态选择最优退出层和推测长度,平衡计算成本和草稿质量。在Llama-2和Llama-3模型上的全面评估表明,LEDE相比自回归解码实现了高达2.0倍至2.7倍的加速,并且比静态推测基线额外提供了17%的加速。

英文摘要

Large Language Models suffer from slow autoregressive inference. While self-speculative decoding accelerates this process, its efficiency is hampered by static configurations like fixed exit layers and speculation lengths. We reframe this optimization as a \textbf{Markov Decision Process} and propose \textbf{LEDE}, a framework that uses offline reinforcement learning. LEDE learns a policy to dynamically select the optimal exit layer and speculation length based on the local context of the generated sequence at each step, balancing computational cost and draft quality. Comprehensive evaluations on Llama-2 and Llama-3 models show LEDE achieves up to a $2.0\times$$\sim$$2.7\times$ speedup over autoregressive decoding and and provides an additional 17\% speedup over the static speculative baselines.

2606.03111 2026-06-03 cs.CV

Inverting the Generation Process of Denoising Diffusion Implicit Models: Empirical Evaluation and a Novel Method

反转去噪扩散隐式模型的生成过程:实证评估与新方法

Yan Zeng, Masanori Suganuma, Takayuki Okatani

AI总结 提出一种结合梯度下降和不动点方法的混合方法,用于从生成图像中恢复DDIM的初始噪声图,显著提高了预测精度和重建质量。

详情
AI中文摘要

本文研究了反转DDIM图像生成过程以从生成图像中恢复潜在变量(特别是初始噪声图)的问题。现有方法在此任务中常面临精度不足的挑战。我们提出了一种新颖的混合方法,该方法在第一步结合了通过梯度下降的直接反转,随后在后续步骤中采用不动点方法。在三个数据集上的实证评估表明,我们的方法显著提高了初始潜在变量的预测精度,同时实现了更优的重建准确性。此外,我们引入了一项新的评估指标,称为自插值测试,该测试评估从真实与预测潜在图之间的插值点生成的图像质量,从而提供对性能更深入的洞察。我们的结果表明,尽管现有方法在重建方面表现尚可,但它们始终无法准确预测初始潜在变量,导致在自插值测试中表现不佳。相比之下,我们的方法在所有指标上均优于其他方法,为扩散模型提供了宝贵的见解,并增强了其在图像生成和编辑中的应用。

英文摘要

This paper studies the problem of inverting the DDIM image generation process to recover latent variables, particularly the initial noise map, from a generated image. Existing methods often struggle with accuracy in this task. We propose a novel hybrid approach that combines direct inversion via gradient descent for the first step, followed by a fixed-point method for subsequent steps. Empirical evaluations across three datasets demonstrate that our method significantly improves the prediction of initial latent variables while achieving superior reconstruction accuracy. Additionally, we introduce a new evaluation, called the self-interpolation test, which assesses the quality of images generated from interpolated points between the true and predicted latent maps, offering deeper insights into performance. Our results reveal that while existing methods perform reasonably well in reconstruction, they consistently fail to accurately predict the initial latent variables, resulting in poor performance on the self-interpolation test. In contrast, our method outperforms all others across all metrics, providing valuable insights into diffusion models and enhancing their applications in image generation and editing.

2606.03108 2026-06-03 cs.AI

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

EvoTrainer: 协同进化LLM策略与训练框架以实现自主智能体强化学习

Guhong Chen, Yingcheng Shi, Yongbin Li, Binhua Li, Xander Xu, Hu Wei, Shiwen Ni, Min Yang, Jieping Ye

AI总结 提出EvoTrainer框架,通过协同进化LLM策略和训练端框架,基于经验反馈自动诊断、修正并积累可复用技能,在数学推理、编程竞赛和仓库级软件工程任务上匹配或超越人工设计的RL基线。

详情
AI中文摘要

自主LLM训练通常被表述为配方搜索,这使训练框架基本保持静态。这种局限性在智能体RL中尤为突出,其中不断变化的瓶颈和标量奖励掩盖了多种失败模式。我们引入了EvoTrainer,一个通过经验反馈协同进化LLM策略和训练端框架的自主训练框架:它诊断rollout级别的证据、修正诊断、回测干预并积累可复用技能。在数学推理、竞赛编程代码生成和仓库级软件工程上的评估表明,在相同数据、代码库和评估协议下,EvoTrainer匹配或超过了人工设计的RL参考,其中在长周期智能体SWE上增益最大。轨迹分析显示,保留的策略在不同领域分化,进化的诊断阻止了无效的高分分支被提升,而可复用技能塑造了后续搜索。自主LLM RL应超越配方搜索,转向策略和解释它们的训练框架的联合进化。

英文摘要

Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask diverse failure modes. We introduce EvoTrainer, an autonomous training framework that co-evolves LLM policies and training-side harnesses through empirical feedback: it diagnoses rollout-level evidence, revises diagnostics, backtests interventions, and accumulates reusable skills. Evaluated on mathematical reasoning, competitive-programming code generation, and repository-level software engineering, EvoTrainer matches or exceeds the human-engineered RL references under the same data, codebase, and evaluation protocol, with the largest gain on long-horizon agentic SWE. Trajectory analyses show that retained strategies diverge across domains, evolving diagnostics prevent invalid high-scoring branches from being promoted, and reusable skills shape later search. Autonomous LLM RL should move beyond recipe search toward joint evolution of policies and the training harnesses that interpret them.

2606.03103 2026-06-03 cs.AI

DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration

DeskCraft: 桌面代理在专业工作流与人在环协作中的基准测试

Wenkai Wang, Tao Xiong, Jingchen Ni, Yunpeng Bao, Xiyun Li, Tianqi Liu, Hongcan Guo, Zilong Huang, Shengyu Zhang

AI总结 提出DeskCraft基准,针对专业创意软件中的长周期工作流和主动人机协作,通过多级难度分类和交互协议评估18种代理,发现GPT-5.4在标准任务上达31.6%,交互任务上达27.6%。

详情
AI中文摘要

专业创意和工程软件中的真实桌面工作流通常跨越长时间跨度,并且往往需要人在环协调,代理在任务进行中主动寻求必要信息,用户提供额外指令、澄清、反馈或修正。然而,现有的桌面GUI基准大多将这一场景简化为短小、简单的任务,所有用户指令预先提供。为解决此问题,我们引入DeskCraft,一个针对长周期创意和工程工作流以及主动人机协作的桌面GUI基准。DeskCraft将任务组织成多级难度分类,长周期任务需要超过50个执行步骤,涵盖设计、视频、音频和3D创作等专业创意软件。此外,DeskCraft将人机协作形式化为一个交互协议,涵盖回合中和回合后交换。回合中交互捕捉代理在不确定性下主动发起的澄清和用户在执行过程中发起的打断,而回合后交互则容纳用户在代理发出完成信号后的反馈,共同覆盖现实协作模式的全空间。我们在538个任务上评估了18个专有和开源代理,发现GPT-5.4在标准任务上达到31.6%,在交互任务上达到27.6%。进一步分析揭示了长周期工作流交付和主动澄清方面的持续失败。我们将在以下网址开源所有评估代码、任务和数据:https://this https URL。

英文摘要

Real-world professional desktop workflows in specialized creative and engineering software unfold over long horizons and often require human-in-the-loop coordination, where agents proactively seek necessary information and users provide additional instructions, clarifications, feedback, or corrections as the task progresses. Yet existing desktop GUI benchmarks mostly reduce this setting to short, simplified tasks with all user instructions provided upfront. To address this issue, we introduce DeskCraft, a desktop GUI benchmark targeting long horizon creative and engineering workflows and proactive human-agent collaboration. DeskCraft organizes tasks into a multilevel difficulty taxonomy, with long horizon tasks requiring over 50 execution steps, and covers professional creative software across design, video, audio, and 3D creation. Furthermore, DeskCraft formalizes human-agent collaboration into an interaction protocol covering mid-turn and post-turn exchanges. Mid-turn interaction captures both agent-initiated clarification under uncertainty and user-initiated interruption during execution, while post-turn interaction accommodates user-driven feedback after the agent signals completion, together spanning the full space of realistic collaboration patterns. We evaluate 18 proprietary and open source agents on 538 tasks and find that GPT-5.4 reaches 31.6% on standard tasks and 27.6% on interactive tasks. Further analyses reveal persistent failures in long horizon workflow delivery and proactive clarification. We will open-source all evaluation codes, tasks, and data at https://github.com/mrwwk/DeskCraft.

2606.03102 2026-06-03 cs.CL

Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

小规模RL控制器,大语言模型:基于RL引导的自适应采样用于测试时扩展

Runpeng Dai, Tong Zheng, Rui Liu, Chengsong Huang, Hongtu Zhu

AI总结 提出将自适应采样建模为马尔可夫决策过程,使用强化学习训练轻量级控制器,在答案正确性、延迟和计算成本之间取得平衡。

详情
AI中文摘要

测试时扩展提高了大语言模型的推理性能,但显著增加了总计算量和延迟。现有的自适应采样方法通过动态决定何时停止采样来部分缓解这一问题,但通常依赖启发式规则或分布假设。在这项工作中,我们将自适应采样建模为马尔可夫决策过程(MDP)。我们使用强化学习(RL)训练一个轻量级采样控制器,以联合平衡答案正确性、延迟和计算成本。在每一轮中,控制器决定停止采样或获取更多样本。我们的方法轻量级,仅依赖最终答案的统计信息,并且可以在CPU上训练和部署。我们进一步表明,该框架可以解释为具有显式预算约束的约束优化问题的拉格朗日松弛。与ASC和ESC等强基线相比,实验表明我们的方法在答案正确性、采样轮数和所需总样本数之间实现了更好的权衡。

英文摘要

Test-time scaling improves the reasoning performance of large language models but incurs substantial cost in both total computation and latency. Existing adaptive sampling methods partially mitigate this issue by dynamically deciding when to stop sampling, yet they typically rely on heuristic rules or rely on distribution assumptions. In this work, we formulate adaptive sampling as a Markov decision process (MDP). We train a lightweight sampling controller with reinforcement learning (RL) to jointly balance answer correctness, latency, and computation cost. At each round, the controller decides to stop sampling or to acquire additional samples. Our method is lightweight which only relies on statistics of final answers, and can be trained and deployed on CPU. We further show that the resulting framework admits an interpretation as the Lagrangian relaxation of a constrained optimization problem with explicit budget constraints. Experiments against strong baselines such as ASC and ESC show that our method achieves improved trade-offs among answer correctness, sampling rounds, and total samples required.

2606.03099 2026-06-03 cs.CL cs.AI

PhotoCraft: Agentic Reasoning with Hierarchical Self-Evolving Memory for Deep Image Search

PhotoCraft: 具有层次自进化记忆的深度图像搜索代理推理

Kailin Lyu, Zhiqiang Yuan, Jianwei He, Qiwei Yan, Xuanbo Su, Nanxing Hu, Yang Liu, Ce Hao, Shengqian Qin, Lianyu Hu, Jinchao Zhang, Jie Zhou

AI总结 提出PhotoCraft,一种无需训练的分层记忆系统,通过工作、情景和语义记忆增强多模态大语言模型,实现深度图像搜索中的多步推理和知识迁移,在DISBench上提升检索性能达18.5%。

详情
AI中文摘要

深度图像搜索需要对丰富的上下文线索(如时间、地点和事件关系)进行多步推理。然而,现有的大语言模型代理大多是无状态和反应式的,缺乏持久记忆来维持长期上下文或跨任务迁移经验,这常常导致执行漂移和经验隔离。为了解决这些限制,我们提出了PhotoCraft,一种无需训练的分层记忆系统,用于照片搜索代理。受人类认知启发,PhotoCraft为多模态大语言模型配备了工作记忆、情景记忆和语义记忆,这些记忆在推理过程中被动态调用,以在多步推理和答案生成中保持逻辑一致性和知识可迁移性。在DISBench上的大量实验表明,PhotoCraft在不同多模态大语言模型骨干上持续改善了上下文感知检索,取得了高达18.5%的性能提升,并有效缓解了无记忆深度图像搜索中的关键瓶颈,为可靠且可泛化的多模态搜索代理提供了一条实用路径。

英文摘要

Deep Image Search requires multi-step reasoning over rich contextual cues, such as time, location, and event relations. However, most existing LLM-based agents are stateless and reactive, lacking persistent memory to maintain long-horizon context or transfer experience across tasks, which often leads to execution drift and experience isolation. To address these limitations, we propose PhotoCraft, a training-free, hierarchical memory system for photo-search agents. Inspired by human cognition, PhotoCraft equips MLLMs with working, episodic, and semantic memory, which are dynamically invoked during reasoning to preserve logical consistency and knowledge transferability throughout multi-step reasoning and answer generation. Extensive experiments on DISBench demonstrate that PhotoCraft consistently improves context-aware retrieval across diverse MLLM backbones, achieving gains of up to 18.5\% and effectively mitigating key bottlenecks in memoryless deep image search, offering a practical path toward reliable and generalizable multimodal search agents.

2606.03097 2026-06-03 cs.AI

From Long News to Accurate Forecast: Importance-Aware Fusion and PRM-Guided Reflection for Time Series Forecasting

从长新闻到准确预测:面向时间序列预测的重要性感知融合与PRM引导的反思

Mingyang Liu, Qingcan Kang, Yuke Wang, Shixiong Kai, Kaichao Liang, Hui-Ling Zhen, Tao Zhong, Mingxuan Yuan, Linqi Song

AI总结 提出一种结合重要性感知新闻压缩和过程奖励模型(PRM)引导检索的框架,解决长新闻上下文窗口限制和迭代检索无引导问题,提升时间序列预测精度并减少迭代次数。

详情
AI中文摘要

将新闻纳入时间序列预测具有吸引力,因为新闻可以揭示仅凭历史值无法恢复的突发外生事件。然而,现有的基于LLM的新闻预测流程面临两个实际限制:相关新闻文章通常超过模型的上下文窗口,并且对补充新闻的迭代检索通常无引导,导致冗余更新和收敛缓慢。我们通过一个结合重要性感知新闻压缩和过程级检索监督的新框架来解决这些问题。首先,我们训练一个重要性奖励模型,该模型估计每篇文章的预测效用,并利用该信号在顺序成对融合期间分配压缩预算,在固定上下文限制内保留信息内容。其次,我们引入一个过程奖励模型(PRM),该模型根据当前误差分布和先前选择文章的历史对多个补充新闻候选进行排序,用质量控制的检索替代一次性盲目检索。两个组件均使用历史数据和真实值进行离线训练;推理使用冻结的过滤逻辑和压缩模块,无需任何反思循环。在金融、能源、交通和比特币预测基准上的实验表明,我们的方法在强基线上提高了预测精度,与迭代基线相比显著减少了细化迭代次数,并且在相关文章跨越数千个标记时仍然有效。

英文摘要

Incorporating news into time series forecasting is appealing because news can reveal abrupt exogenous events that historical values alone cannot recover. However, existing LLM-based news-forecasting pipelines face two practical limitations: relevant news articles often exceed the model's context window, and iterative retrieval of supplementary news is typically unguided, leading to redundant updates and slow convergence. We address these issues with a novel framework that combines importance-aware news compression and process-level retrieval supervision. First, we train an importance reward model that estimates the forecasting utility of each article and uses this signal to allocate compression budgets during sequential pairwise fusion, preserving informative content within a fixed context limit. Second, we introduce a process reward model (PRM) that ranks multiple supplementary-news candidates conditioned on the current error profile and the history of previously selected articles, replacing one-shot blind retrieval with quality-controlled selection. Both components are trained offline using historical data with ground truth; inference uses the frozen filtering logic and compression modules without any reflection loop. Experiments on finance, energy, traffic, and bitcoin forecasting benchmarks show that our method improves prediction accuracy over strong baselines, significantly reduces the number of refinement iterations compared to the iterative baseline, and remains effective when relevant articles span thousands of tokens.

2606.03094 2026-06-03 cs.LG

FGRPO: Federated GRPO with Adaptive Aggregation on Non-IID Data

FGRPO:非独立同分布数据上具有自适应聚合的联邦GRPO

Pengyu Chen, Shaowei Li, Kai Wang, Yunsheng Yuan, Kai Han, Jun Luo, Feng Li

AI总结 提出联邦GRPO(FGRPO)框架,通过基于相对性能增益的自适应聚合机制,在非独立同分布数据上实现去中心化推理模型微调,兼顾数据隐私与鲁棒收敛。

详情
AI中文摘要

语言模型的最新进展已将强化学习确立为引发自我纠正和长链推理的主要范式。虽然群体相对策略优化(GRPO)通过消除评论家网络提供了卓越的可扩展性,但将其部署在中央基础设施上需要从分布式所有者收集大量数据,这带来了显著的隐私风险。为了解决这些问题,我们引入了联邦GRPO(FGRPO),这是一个旨在跨异构数据所有者去中心化推理模型微调的框架。为了有效缓解异构任务间奖励尺度差异引起的不稳定性,FGRPO结合了一种基于相对性能增益的自适应聚合机制。通过刻画每个客户端相对于其个性化历史基线的改进,该框架动态地优先考虑有效的学习轨迹,而无需考虑局部任务的难度。FGRPO在非独立同分布数据上确保鲁棒收敛,同时保护数据隐私。

英文摘要

Recent advances in language models have established reinforcement learning as the primary paradigm for eliciting self-correction and long-chain reasoning. While group relative policy optimization (GRPO) offers superior scalability by eliminating the critic network, deploying it on a central infrastructure entails collecting a large volume of data from distributed owners, which poses significant privacy risks. To address these concerns, we introduce federated GRPO (FGRPO), a framework designed to decentralize the fine-tuning of reasoning models across heterogeneous data owners. To effectively mitigate the instability caused by divergent reward scales across heterogeneous tasks, FGRPO incorporates an adaptive aggregation mechanism based on relative performance gain. By characterizing each client's improvement relative to its personalized historical baseline, the framework dynamically prioritizes effective learning trajectories regardless of local task difficulty. FGRPO ensures robust convergence on non-IID data while preserving data privacy.

2606.03093 2026-06-03 cs.AI

Decomposing how prompting steers behavior

分解提示如何引导行为

Fan L. Cheng, Nikolaus Kriegeskorte

AI总结 提出嵌套几何分解框架,通过刺激不变映射分析提示如何重塑表示几何,揭示跨维度线性混合是提示引导行为的关键机制。

详情
Comments
59 pages, 41 figures
AI中文摘要

提示引导大型语言模型(LLMs)和视觉语言模型(VLMs)无需权重更新,但指令变化如何重塑内部表示以产生行为仍不清楚。我们引入了一个嵌套几何分解框架,将提示视为对提示后内容表示几何的变换。对于每个提示对,我们使用越来越具表达力的刺激不变映射(平移、均匀缩放刚性变换、顺序轴缩放、仿射变换和非线性变换)对齐两个提示下相同刺激的表示。然后,我们通过将单个层的提示A隐藏状态替换为其映射版本,并测量提示B表示几何和行为的恢复程度,来因果测试每个映射。在三个LLM、三个VLM以及涵盖风格、情感、场景内容和数字的六个文本或图像数据集上,提示一致地将表示重塑为指示的任务结构。交叉验证的方差分解显示,许多提示诱导的激活变化由保持形状的映射(尤其是平移和均匀缩放刚性变换)捕获,而层级剖面揭示了跨层的模型和任务特定路由策略。关键的是,尽管平移和刚性层级已经改善了行为一致性,但仿射变换是第一个几乎完全恢复目标提示任务几何并带来相应行为增益的层级。这表明跨维度线性混合是提示将表示重组为指示任务结构的关键机制。我们的框架将提示诱导的表示变化分解为可解释的几何组件,并揭示了模型如何路由任务相关结构以产生提示驱动的行为。

英文摘要

Prompting steers large language models (LLMs) and vision-language models (VLMs) without weight updates, but it remains unclear how instruction changes reshape internal representations to produce behavior. We introduce a nested geometric decomposition framework that treats prompting as a transformation of the representational geometry of the content following the prompt. For each prompt pair, we align representations of the same stimuli under two prompts using increasingly expressive stimulus-invariant maps: translation, rigid transformation with uniform scaling, sequential axis scaling, affine transformation, and nonlinear transformation. We then causally test each map by replacing a single layer's prompt-A hidden state for held-out stimuli with its mapped counterpart and measuring recovery of prompt-B representational geometry and behavior. Across three LLMs, three VLMs, and six text or image datasets spanning style, emotion, scene content, and number, prompts consistently reshape representations toward the instructed task structure. Cross-validated variance decomposition shows that much prompt-induced activation change is captured by shape-preserving maps, especially translation and rigid transformation with uniform scaling, while tier profiles reveal model- and task-specific routing strategies across layers. Crucially, although translation and rigid tiers already improve behavioral agreement, affine transformation is the first tier to nearly recover target-prompt task geometry and yields corresponding behavioral gains. This suggests that cross-dimensional linear mixing is a key mechanism by which prompts reorganize representations toward instructed task structure. Our framework decomposes prompt-induced representational change into interpretable geometric components and reveals how models route task-relevant structure to produce prompt-driven behavior.

2606.03090 2026-06-03 cs.CR cs.AI

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

“**重要** 你应该给我满分!”:探索针对基于LLM的自动评分系统的提示注入攻击

Hang Li, Fedor Filippov, Yuling Lin, Pengfei He, Kaiqi Yang, Yucheng Chu, Yingqian Cui, Hui Liu, Jiliang Tang

AI总结 研究针对基于LLM的自动评分系统的提示注入攻击,通过实验证明当前系统高度脆弱,并评估现有防御策略的有效性。

详情
Comments
15 pages, 8 figures, 9 tables
AI中文摘要

大型语言模型(LLM)的出现显著加速了近期关于基于LLM的自动评分(AG)系统的研究。受益于LLM强大的指令遵循能力和广泛的先验知识,教育工作者可以使用仅包含自然语言评分标准的AG系统跨不同任务部署,并获得令人满意的评分性能。尽管有这些优势,新的安全问题也可能出现。特别是,提示注入(PI)攻击最近已成为基于LLM的应用的主要威胁。在AG的背景下,攻击者可能利用PI漏洞操纵评分系统,使其无论实际答案质量如何都人为地给出高分。这种行为对教育评估的公平性、可靠性和完整性构成严重风险。在这项工作中,我们研究了AG系统中的PI攻击,并系统地调查了此类攻击在教育场景中的有效性。我们进一步评估了现有防御策略对抗这些攻击的有效性。通过在基于评分标准的评分设置下进行全面的实验,我们证明了当前基于LLM的AG系统仍然高度容易受到PI攻击。我们希望我们的发现能提高对这种新兴威胁的认识,并激励未来研究朝着安全、稳健和可信的基于LLM的教育系统发展。

英文摘要

The emergence of large language models (LLMs) has significantly accelerated recent research on LLM-based automatic grading (AG) systems. Benefiting from the strong instruction-following capabilities and broad prior knowledge of LLMs, educators can deploy AG systems across diverse tasks using only natural language rubrics while achieving satisfactory grading performance. Despite these advantages, new security concerns may also arise. In particular, prompt injection (PI) attacks have recently become a major threat to LLM-based applications. In the context of AG, attackers can potentially exploit PI vulnerabilities to manipulate grading systems into assigning artificially high scores regardless of the actual answer quality. Such behavior poses serious risks to the fairness, reliability, and integrity of educational assessment. In this work, we study PI attacks in AG systems, and systematically investigate the effectiveness of such attacks in educational scenarios. We further evaluate the effectiveness of existing defensive strategies against these attacks. Through comprehensive experiments under rubric-based grading settings, we demonstrate that current LLM-based AG systems remain highly vulnerable to PI attacks. We hope that our findings raise awareness of this emerging threat and motivate future research toward secure, robust, and trustworthy LLM-based educational systems.

2606.03089 2026-06-03 cs.LG cs.AI

Constitutional On-Policy Safe Distillation

宪法性在策略安全蒸馏

Ming Wen, Yuxuan Liu, Kun Yang, Yunhao Feng, Zhuoer Xu, Yuhao Sun, Shiwen Cui, Xiang Zheng, Xingjun Ma, Yu-Gang Jiang

AI总结 针对在策略自蒸馏在安全对齐中因宪法条件导致教师分布收缩、表达能力下降的问题,提出宪法性在策略安全蒸馏(COPSD),通过交叉SFT冷启动校准教师分布,再进行宪法条件在策略蒸馏,在12个基准上实现了更优的安全-有用性权衡并降低安全税。

详情
AI中文摘要

在策略自蒸馏(OPSD)通过使用基于特权信息条件的教师提供密集的令牌级监督,已成为一种高效的后训练范式。先前工作表明,OPSD在可验证推理任务中可能崩溃,但安全对齐不同,它由高层宪法而非显式目标答案指导,因此是重新审视密集蒸馏的自然场景。然而,我们的初步研究表明,安全OPSD仍然遭受严重崩溃:宪法条件将教师分布收缩为短且过于保守的响应,而反向KL进一步将这种收缩放大为表达能力下降。我们将此效应形式化为非正交语义空间中安全边界下的几何泄漏,其中安全压力转移到表达能力维度。基于此分析,我们提出宪法性在策略安全蒸馏(COPSD),首先通过交叉SFT冷启动校准教师,然后执行宪法条件在策略蒸馏。在12个基准上的实验表明,COPSD比基线实现了持续更强的安全-有用性权衡,同时大幅降低了对通用推理能力的安全税。

英文摘要

On-policy self-distillation (OPSD) has emerged as an efficient post-training paradigm by using a teacher conditioned on privileged information to provide dense token-level supervision. Prior work has shown that OPSD can collapse in verifiable reasoning tasks, but safety alignment differs in that it is guided by high-level constitutions rather than explicit target answers, making it a natural setting to revisit dense distillation. However, our pilot study show that safety OPSD still suffers from severe collapse: constitutional conditioning contracts the teacher distribution toward short and overly conservative responses, and Reverse KL further amplifies this contraction into reduced expressiveness. We formalize this effect as geometric leakage under safety boundaries in a non-orthogonal semantic space, where safety pressure transfers into the expressiveness dimension. Based on this analysis, we propose Constitutional On-Policy Safe Distillation (COPSD), which first calibrates the teacher through a Cross-SFT cold-start and then performs constitution-conditioned on-policy distillation. Experiments on 12 benchmarks show that COPSD achieves a consistently stronger safety--helpfulness trade-off than baselines while substantially reducing the safety tax on general reasoning ability.

2606.03087 2026-06-03 cs.LG

Learning to Solve, Forgetting to Retain: Correct-Set Turnover in RLVR

学会解决,忘记保留:RLVR中的正确集更替

Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Peng Fu, Zheng Lin

AI总结 针对强化学习可验证奖励(RLVR)中模型遗忘已解决问题的问题,提出正确集更替现象和修复窗口原则,并设计保留感知的回顾机制\method{},通过零额外开销的预部署批量替换提升多模态任务性能。

详情
AI中文摘要

强化学习可验证奖励(RLVR)提升了大型语言模型的能力,然而头条准确率的提升往往掩盖了一个隐藏代价:随着训练进行,先前解决的问题悄然变得无法解决。我们将此现象定义为\emph{正确集更替},代表了在已掌握集上解决方案获取与退化的耦合动态。在此视角下,保留与获取一样成为明确的优化目标。我们分析并实证建立了\emph{修复窗口原则}:恢复退化提示的成本随回顾延迟急剧增加,定义了一个标准RLVR流程未能利用的低成本窗口。为解决此问题,我们提出\method{},一种保留感知的回顾机制,追踪已掌握提示并定期重新引入以\emph{提醒}模型先前的解决方案。通过利用预部署批量替换,\method{}引入零额外部署开销。在涵盖图像-文本、视频和纯文本任务的20个基准上,使用Qwen3-VL和Qwen2.5-Math进行评估,\method{}在GRPO、DAPO和回放基线上持续提升性能,展示了跨模态和算法的稳健泛化能力。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) improves the ability of large language model, yet headline accuracy gains often conceal a hidden cost: previously solved problems quietly become unsolvable as training proceeds. We frame this phenomenon as \emph{correct-set turnover}, representing the coupled dynamics of solution acquisition and regression over the mastered set. Under this view, retention becomes an explicit optimization target alongside acquisition. We analytically and empirically establish the \emph{repair-window principle}: the cost of restoring a regressed prompt grows sharply with review delay, defining a low-cost window that standard RLVR pipelines fail to exploit. To address this, we propose \textbf{\method{}}, a retention-aware review mechanism that tracks mastered prompts and periodically reintroduces them to \textbf{remind} the model of previous solutions. By utilizing pre-rollout batch replacement, \method{} incurs zero additional rollout overhead. Evaluated across 20 benchmarks spanning image-text, video, and text-only tasks with Qwen3-VL and Qwen2.5-Math, \method{} consistently improves performance over GRPO, DAPO, and replay baselines, demonstrating robust generalizability across modalities and algorithms.

2606.03085 2026-06-03 cs.LG cs.CL

Multi-component Causal Tracing in Large Language Models

大型语言模型中的多组件因果追踪

Zirui Yan, Dennis Wei, Dmitriy A. Katz, Prasanna Sattigeri, Ali Tajer

AI总结 本文提出一个统一框架,通过软干预和度量转换高效识别对目标性能指标最关键的多组件子集,优于现有基线方法。

详情
Comments
Accepted to ACL 2026 main conference
AI中文摘要

因果追踪通过系统地干预大型语言模型(LLM)的内部表示,揭示并量化将特定输入或计算与特定感兴趣指标联系起来的因果路径,从而量化LLM的行为。在先前单组件或单层研究的基础上,本文提出了一个同时因果追踪多个组件的统一框架。该框架系统地识别对期望目标性能指标(如准确性和公平性)最关键的组件子集(例如注意力头和多层感知器神经元)。这是通过将灵活的干预应用于广泛期望的指标来实现的。为了解决多组件问题的组合复杂性,设计了一种高效算法,该算法利用软干预和精心设计的度量转换,将组合搜索问题转化为一个连续问题,该问题可以在适当约束下高效求解,从而为选择组件生成适当的二元决策。实验结果表明,所提出的方法高效地识别出对目标指标具有高影响力的模型组件子集,优于现有基线方法。我们的代码可从此https URL获取。

英文摘要

Causal tracing systematically intervenes on a large language model's (LLM's) internal representations to uncover and quantify the causal pathways linking specific inputs or computations to specific metrics of interest, quantifying the LLM's behavior. Building on previous single-component or single-layer studies, this paper presents a unified framework for causally tracing multiple components simultaneously. This framework systematically identifies the subsets of components (e.g., attention heads and multi-layer perceptron neurons) most critical to a desired target performance metric (e.g., accuracy and fairness). This is achieved by incorporating flexible interventions applied to a wide range of desired metrics. To address the combinatorial complexity of the multi-component problem, an efficient algorithm is designed that leverages soft interventions and a carefully designed metric transformation, converting the combinatorial search problem into a continuous one that can be solved efficiently under proper constraints, thereby generating proper binary decisions for selecting components. Experimental results demonstrate that the proposed method efficiently identifies subsets of the model's components that have a high impact on the target metric, outperforming existing baseline approaches. Our code is available at https://github.com/ZiruiYan/multi-component-causal-tracing.

2606.03084 2026-06-03 cs.CV

Hierarchical Federated Learning with Dynamic Clustering and Adaptive Regularization for Robust Infrastructure Inspection

面向鲁棒基础设施检测的动态聚类与自适应正则化分层联邦学习

Yuhu Feng, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

AI总结 提出一种分层联邦学习框架,通过宏观动态梯度聚类和微观自适应正则化解决基础设施检测中数据异构问题,实现鲁棒且特化的诊断模型。

详情
AI中文摘要

由于严格的隐私和安全法规,数据驱动计算机视觉模型在结构健康监测(SHM)中的应用受到数据孤岛困境的严重制约。虽然联邦学习(FL)提供了一种保护隐私的协作替代方案,但其在全国性基础设施网络中的应用受到“双重异构性”挑战的严重阻碍:不同结构类型之间的宏观物理差异以及本地数据集内的微观统计不平衡。为了克服这一挑战,本文提出了一种新颖的分层联邦学习框架。该框架协调了一种协同的两层优化策略。在宏观层面,一种基于动态梯度的聚类机制根据客户的结构退化轨迹自动将分布式客户聚合成专门的专家组,无需先验地理元数据。同时,在微观层面,一种簇内动态区域自适应近端正则化(DRAPR)模块为每个客户端计算实时统计的非独立同分布强度分数。通过基于局部标签偏斜和梯度发散自适应调整近端惩罚,DRAPR有效校准局部更新,减轻客户端漂移,并防止少数损伤类别的灾难性遗忘。在大型真实世界结构检测数据集上的综合评估表明,宏观聚类与微观正则化的分层集成成功中和了双层异构性,为复杂基础设施检测生成了高度鲁棒且特化的诊断模型。

英文摘要

The deployment of data-driven computer vision models for structural health monitoring (SHM) is heavily constrained by the data silo dilemma due to stringent privacy and security regulations. While federated learning (FL) offers a privacy-preserving collaborative alternative, its application to nationwide infrastructure networks is severely hindered by the challenge of ``double heterogeneity'': macro-level physical divergence across disparate structural types and micro-level statistical imbalances within local datasets. To overcome this challenge, this paper proposes a novel hierarchical federated learning framework. The framework orchestrates a synergistic two-tier optimization strategy. At the macro-level, a dynamic gradient-based clustering mechanism autonomously aggregates distributed clients into specialized expert groups based on their structural degradation trajectories, circumventing the need for prior geographical metadata. Concurrently, at the micro-level, an intra-cluster Dynamic Region-Adaptive Proximal Regularization (DRAPR) module computes a real-time statistical Non-IID Intensity Score for each client. By adaptively modulating a proximal penalty based on local label skewness and gradient divergence, DRAPR effectively calibrates local updates, mitigates client drift, and prevents the catastrophic forgetting of minority damage classes. Comprehensive evaluations on a large-scale, real-world structural inspection dataset demonstrate that the hierarchical integration of macro-clustering and micro-regularization successfully neutralizes dual-level heterogeneity, yielding highly robust and specialized diagnostic models for complex infrastructure inspection.