arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.03142 2026-06-03 cs.CV

Disentangling Visual and Factual Correctness in LVLMs' Visualization Literacy

解构LVLMs可视化素养中的视觉与事实正确性

Soohyun Lee, Jaeyoung Kim, Seokhyeon Park, Sihyeon Lee, Jiwon Song, Bohyoung Kim, Hyunjoo Song, Jinwook Seo

AI总结提出框架分离视觉正确性与事实正确性，通过反事实测试和仲裁指标揭示LVLMs在可视化素养评估中依赖事实记忆而非视觉推理的问题。

详情

Comments: Under review at IEEE Transactions on Visualization and Computer Graphics (TVCG). 23 pages, 9 figures

AI中文摘要

大型视觉语言模型（LVLMs）展现出强大的可视化解释能力，但尚不清楚其响应是否反映对视觉证据的真实推理，还是训练中习得的事实先验。当前评估混合了这两种来源，掩盖了正确视觉解释被记忆事实覆盖的情况。我们提出了一个将视觉正确性与事实正确性分离的框架，揭示了现有可视化素养评估的有效性局限。通过15个最先进LVLMs的三个实验：（1）多个模型在标准测试（VLAT）上达到人类水平，但这可能反映事实回忆而非视觉理解，而随机数据测试（reVLAT）在正确视觉解释被事实先验取代时低估了素养。（2）使用我们的反事实可视化素养评估测试（CVLAT）和能力归一化仲裁指标，我们根据视觉-事实依赖指数（VFRI）的符号对模型进行分类，揭示了以视觉为导向的多数和以事实知识为导向的少数，尽管几个接近零的情况需要谨慎。在相同反事实项目上的人类基线（N=30）证实，人们在冲突时绝大多数遵循图表，提供了人类参考点。（3）基于提示的干预可以改变优先级，但其有效性高度依赖模型且方向不对称，高图表阅读能力不能预测提示可控性。总体而言，高可视化准确性不足以证明忠实的视觉推理：可靠地集成到视觉分析中不仅需要评估可视化素养，还需要评估模型在视觉证据和事实先验分歧时如何仲裁。基准和代码：此 https URL

英文摘要

Large Vision-Language Models (LVLMs) show strong visualization interpretation, yet it is unclear whether their responses reflect genuine reasoning over visual evidence or factual priors learned during training. Current evaluations mix these two sources, obscuring when correct visual interpretation is overridden by memorized facts. We present a framework that isolates visual correctness from factual correctness, revealing validity limitations in existing visualization literacy assessments. Across three experiments with 15 state-of-the-art LVLMs: (1) several models reach human-level performance on standard tests (VLAT), but this may reflect factual recall rather than visual understanding, while randomized-data tests (reVLAT) underestimate literacy when correct visual interpretation is superseded by factual priors. (2) Using our Counterfactual Visualization Literacy Assessment Test (CVLAT) with capability-normalized arbitration metrics, we classify models by the sign of their visual-factual reliance index (VFRI), revealing a visualization-oriented majority and a factual knowledge-oriented minority, though several near-zero cases warrant caution. A human baseline (N=30) on the same counterfactual items confirms that people overwhelmingly follow the chart under conflict, providing a human reference point. (3) Prompt-based intervention can shift prioritization, but its effectiveness is highly model-dependent and direction-asymmetric, and high chart-reading capability does not predict prompt-controllability. Overall, high visualization accuracy is not sufficient evidence of faithful visual reasoning: reliable integration into visual analytics requires evaluating not only visualization literacy but also how models arbitrate between visual evidence and factual priors when the two diverge. Benchmark and code: https://github.com/JaeyoungKim-HCIL/CVLAT

URL PDF HTML ☆

赞 0 踩 0

2606.03137 2026-06-03 cs.AI

Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation

Think-Before-Speak: 从内部评估到多智能体社会模拟中的公开表达

Kaiqi Yang, Tai-Quan Peng, Sanguk Lee, Hui Liu

AI总结提出TBS框架，通过分离智能体的内部推理与公开话语生成，模拟从内部评估到公开表达的路径，并在气候政策讨论中验证其机制敏感性。

详情

AI中文摘要

基于LLM的多智能体模拟为研究社会互动、审议和集体意见动态提供了一种有前景的方法。然而，许多现有的对话模拟框架主要将互动表示为可观察的轮次交换或聚合输出，使得沉默、说话意图和公开表达背后的内部评估过程难以考察。我们引入了TBS（Think-Before-Speak），一种基于间隔的多智能体模拟框架，将智能体的私人推理与公开话语生成分离。在每个间隔，所有智能体基于共享的对话历史及其自身记忆更新结构化的内部状态。这些状态包括与失调相关的评估、感知的意见气候、感知的孤立风险、回应策略和说话意愿。然后，协调器解决竞争的说话意图，并将一个话语提交到公共对话中，允许内部评估和公共互动随时间共同演化。我们在模拟的关于气候相关政策问题的市政厅讨论中评估了TBS。结果表明，TBS产生连贯的内部状态轨迹，并且这些轨迹在轮次分配、沉默和记忆条件下系统地变化。与失调相关的评估增加了智能体的说话意愿，而沉默压力评估则降低了它。一旦形成说话意图，公开表达主要由轮次分配规则塑造。这些发现表明，TBS通过使从内部评估到公开表达的路径可观察和可分析，支持机制敏感的社会模拟。

英文摘要

LLM-based multi-agent simulation offers a promising way to study social interaction, deliberation, and collective opinion dynamics. However, many existing dialogue simulation frameworks represent interaction mainly as observable turn exchange or aggregated outputs, leaving the internal evaluative processes behind silence, speaking intention, and public expression difficult to examine. We introduce TBS (Think-Before-Speak), an interval-based multi-agent simulation framework that separates agents' private reasoning from public utterance generation. At each interval, all agents update structured internal states based on the shared dialogue history and their own memory. These states include dissonance-related appraisal, perceived opinion climate, perceived isolation risk, response strategy, and willingness to speak. The orchestrator then resolves competing speaking intentions and commits one utterance to the public dialogue, allowing internal evaluation and public interaction to co-evolve over time. We evaluate TBS in simulated town hall discussions on a climate-related policy issue. Results show that TBS produces coherent internal-state traces and that these traces vary systematically across turn-allocation, silence, and memory conditions. Dissonance-related appraisal increases agents' willingness to speak, whereas silence-pressure appraisal decreases it. Once speaking intention is formed, public expression is shaped mainly by turn-allocation rules. These findings suggest that TBS supports mechanism-sensitive social simulation by making the pathway from internal evaluation to public expression observable and analyzable.

URL PDF HTML ☆

赞 0 踩 0

2606.03136 2026-06-03 cs.CR cs.CL

PsychoPass: Geometric Profiling of Multi-Turn Adversarial LLM Conversations

PsychoPass: 多轮对抗性LLM对话的几何轮廓分析

Muberra Ozmen, Subhabrata Majumdar

AI总结提出PsychoPass框架，通过提取对话轨迹在嵌入空间中的几何特征，在有害内容生成前预测多轮越狱攻击，并发现早期几何信号具有鲁棒性。

详情

AI中文摘要

对大型语言模型（LLM）的多轮越狱攻击揭示了当前防护措施的不匹配：它们作用于单个轮次，而攻击则作为跨对话的轨迹展开。我们提出从内容转向动态，将对话建模为表示空间中的路径，并询问对抗意图是否在其几何形状中早期编码。我们引入了PsychoPass，一个从嵌入空间中的对话轨迹提取几何特征以在有害内容生成前预测潜在攻击的框架。这些特征在朴素分类器中实现了近乎完美的性能，这很大程度上可以通过包含轮次数作为特征来解释。去除这一混淆因素后，仍保留了一个较小但一致的几何信号，其分类性能不显著依赖于编码器选择。关键的是，该信号在对话早期出现：仅从短前缀开始，攻击结果就高于随机水平，比基线防护更可靠。一项支持性理论分析通过长度与形状的分解、基于前缀长度的检测界限以及编码器不变性解释了这些发现。综合来看，这些结果表明对抗性对话留下了早期、表示鲁棒的几何指纹，适用于在线监控。

英文摘要

Multi-turn jailbreak attacks on large language models (LLMs) reveal a mismatch in current guardrails: they operate on individual turns, while attacks unfold as trajectories across conversations. We propose a shift from content to dynamics, modeling conversations as paths in representation space and asking whether adversarial intent is encoded early in their geometry. We introduce PsychoPass, a framework that extracts geometric features from conversation trajectories in embedding space to predict a potential attack before harmful content is produced. These features achieve near-perfect performance in naïve classifiers, which is largely explained by the inclusion of number of turns as a feature. After removing this confound, a smaller but consistent geometric signal remains, with classification performance that does not depend meaningfully on encoder choice. Crucially, this signal appears early in the conversation: attack outcomes remain above chance from short prefixes alone, more reliably than baseline guardrails. A supporting theoretical analysis explains these findings via a decomposition of length and shape, a detection bound based on prefix length, and encoder invariance. Together, these results show that adversarial conversations leave an early, representation-robust geometric fingerprint suitable for online monitoring.

URL PDF HTML ☆

赞 0 踩 0

2606.03135 2026-06-03 cs.AI

Uncertainty-Aware Clarification in LLM Agents with Information Gain

基于信息增益的LLM智能体不确定性感知澄清

Mengyi Deng, Zhiwei Li, Xin Li, Tingyu Zhu, Ying Zhao, Zhijiang Guo, Wei Wang

AI总结针对用户指令不明确导致LLM智能体工具操作错误的问题，提出一种以信息增益奖励为导向的澄清框架，通过贝叶斯信念更新量化澄清问题的效用，训练智能体生成高信息增益的澄清，在τ-Bench环境中将任务成功率提升3.7%，仅增加0.3个交互步骤。

详情

Journal ref: ICML 2026

AI中文摘要

大型语言模型（LLM）智能体通常在未明确说明的用户指令下运行，其中关于用户意图的潜在不确定性会导致错误的工具操作。为了解决这一挑战，我们提出了一种目标导向的澄清框架，将澄清行为与歧义消除对齐。我们方法的核心是信息增益奖励，这是一种通过测量由澄清交互引起的对真实目标贝叶斯信念更新来量化澄清问题效用的指标。我们使用该奖励训练澄清器（LLM），以优化高信息增益，确保澄清有效减少不确定性并提高智能体-工具-用户环境中的任务完成度。我们在一个增强澄清的τ-Bench环境中验证了我们的框架，并在五个异质骨干网络上进行了跨智能体评估。实验结果表明，与无澄清基线相比，我们的方法一致地将成功率提高了3.7%，同时平均仅增加了0.3个总交互步骤。

英文摘要

Large Language Model (LLM) agents often operate under underspecified user instructions, where latent uncertainty over user intent leads to erroneous tool actions. To address this challenge, we propose a goal-oriented clarification framework that aligns clarification behavior with ambiguity resolution. Central to our approach is the Information Gain Reward, a metric that quantifies the utility of clarification questions by measuring the Bayesian belief update towards the ground-truth goal induced by the clarification exchange. We train the clarifier (LLM) using this reward to optimize for high information gain, ensuring that clarifications effectively reduce uncertainty and improve task completion within the agent-tool-user environment. We validate our framework within a clarification-enhanced $τ$-Bench environment, conducting cross-agent evaluations across five heterogeneous backbones. Empirical results demonstrate that our method consistently improves the success rate by 3.7\% over the no-clarification baseline, while adding only 0.3 total interaction steps on average.

URL PDF HTML ☆

赞 0 踩 0

2606.03134 2026-06-03 cs.RO cs.LG

How Visible Are Silent Manipulation Failures? An Observability Study of False-Success Detection in Simulated Robot Episodes

无声操作失败的可见性：模拟机器人任务中假成功检测的可观测性研究

Aarav Bedi

AI总结本研究通过模拟双机械臂ALOHA任务，探讨机器人自身成功检测器标记为成功的任务中，假成功（实际失败但被误判为成功）的可恢复性，发现基于关节数据的检测器在方块转移任务中几乎完全可恢复假成功，而在插销任务中仅部分可恢复，视觉检测器可弥补差距，且可分离性依赖于远低于实际传感器噪声的速度差异。

详情

Comments: 4 pages, 3 figures

AI中文摘要

模仿学习策略用于机器人操作时，其训练任务的成功标签质量取决于机器人自身的成功检测器。一种特别有害的错误是假成功：机器人记录为成功但实际任务结果错误的任务。我们针对这些任务提出一个狭窄但实际的问题：一旦任务被标记为成功，推翻该标签所需的信息有多少存在于本体感觉中，又有多少需要视觉？我们在两个双机械臂ALOHA任务上构建模拟测试平台，通过环境扰动而非标签编辑诱发失败，利用检测器从未见过的特权模拟器状态标记每个任务，仅保留机器人标记为成功的任务。然后，我们将限制于本体感觉的检测器与基于视觉的检测器进行比较。我们发现可恢复性范围广泛：在方块转移任务中，假成功几乎完全可从关节数据中恢复，而在插销插入任务中，本体感觉仅恢复部分假成功，视觉检测器则弥补了大部分差距。我们还表明，我们测量的本体感觉可分离性依赖于远低于任何实际传感器噪声水平的速度差异，因此最好将其视为无噪声模拟器夸大的乐观上限。我们发布了生成和评估流程。

英文摘要

Imitation-learning policies for robot manipulation inherit the quality of the success labels attached to their training episodes, and those labels are usually produced by the robot's own success check. A particularly damaging error is the false success: an episode the robot logs as a success when the task outcome was actually wrong. We ask a narrow but practical question about these episodes. Once an episode has already been flagged as a success, how much of the information needed to overturn that label is present in proprioception, and how much requires vision? We build a simulated testbed on two bimanual ALOHA tasks, induce failures through environment perturbations rather than label edits, label every episode by privileged simulator state that the detector never sees, and keep only episodes the robot flagged as successful. We then compare detectors restricted to proprioception against a vision-based detector. We find that recoverability spans a wide range: in cube transfer the false successes are almost fully recoverable from joint data alone, while in peg insertion proprioception recovers only part of them and a vision detector closes most of the gap. We also show that the proprioceptive separability we measure rests on velocity differences far below any realistic sensor noise floor, so it is best read as an optimistic upper bound that a noiseless simulator inflates. We release the generation and evaluation pipeline.

URL PDF HTML ☆

赞 0 踩 0

2606.03132 2026-06-03 cs.CL

DMT-CBT: Longitudinal Therapeutic State Modeling for CBT Counseling

DMT-CBT：面向CBT咨询的纵向治疗状态建模

Chang Liu, Shuyi Zhang, Changsheng Ma, Yongfeng Tao, Minqiang Yang, Bin Hu

AI总结提出DMT-CBT框架，通过跨会话结构化治疗状态、多模态行为基础与工具增强干预，解决现有方法将CBT咨询简化为局部回复生成的问题，实现纵向治疗状态动态建模。

详情

AI中文摘要

大型语言模型（LLM）在认知行为疗法（CBT）咨询中展现出日益增长的潜力。然而，现有方法大多将咨询视为局部回复生成问题，专注于短文本、仅文本或单次会话交互中的共情回复。我们认为这种表述从根本上与真实心理治疗的本质不符。在临床CBT中，治疗是一个纵向过程，治疗师需要跨会话持续推断、更新和干预不断演变的治疗状态。真实的CBT还涉及多模态推理和延迟的跨会话干预效果，要求模型在部分可观测性下捕捉纵向治疗状态的演变。我们提出DMT-CBT，一个用于CBT咨询中治疗状态动态建模的框架。DMT-CBT跨会话维护结构化的治疗状态，同时整合多模态行为基础和工具增强的干预，以支持适应性治疗推理。基于该框架，我们构建了DMTCorpus，一个合成的多会话多模态CBT咨询数据集，具有演变的治疗状态、图像基础的患者行为以及跨会话干预的连续性。实验结果表明，与事后提取方法相比，DMT-CBT提高了咨询保真度和治疗联盟，产生了更有利的纵向情感轨迹，并更忠实地保留了治疗状态。

英文摘要

Large language models (LLMs) have shown growing potential for Cognitive Behavioral Therapy (CBT) counseling. However, most existing approaches still formulate counseling as a local response generation problem, focusing on empathetic replies within short, text-only, or single-session interactions. We argue that this formulation fundamentally mismatches the nature of real psychotherapy. In clinical CBT, therapy is a longitudinal process in which therapists continuously infer, update, and intervene on evolving therapeutic states across sessions. Realistic CBT further involves multimodal inference and delayed cross-session intervention effects, requiring models to capture longitudinal therapeutic state evolution under partial observability. We propose DMT-CBT, a framework for Dynamic Modeling of evolving Therapeutic states in CBT counseling. DMT-CBT maintains structured therapeutic states across sessions while incorporating multimodal behavioral grounding and tool-augmented intervention to support adaptive therapeutic reasoning. Based on this framework, we construct DMTCorpus, a synthetic multi-session multimodal CBT counseling dataset featuring evolving therapeutic states, image-grounded client behaviors, and cross-session intervention continuity. Experimental results show that DMT-CBT improves counseling fidelity and therapeutic alliance, produces more favorable longitudinal affective trajectories, and preserves therapeutic states more faithfully than post-hoc extraction approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.03131 2026-06-03 cs.LG

HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models

HARVE：面向鲁棒奖励模型的感知黑客奖励头向量编辑

Shuang Liu, Yuxuan Bo, Qiuyang Zhao, Caiyue Huang, Xiaorong Chen, Yanguang Liu, Mengnan Du

AI总结针对奖励模型易受奖励黑客攻击的问题，提出无需训练的奖励头编辑方法HARVE，通过移除与黑客相关子空间对齐的奖励头向量分量，提升鲁棒性并保持通用能力。

详情

AI中文摘要

奖励模型对于大型语言模型（LLM）对齐至关重要，但它们仍然容易受到奖励黑客攻击。为了评估奖励模型的鲁棒性，我们引入了RewardHackBench，其中包含13种奖励黑客模式，涵盖现实生活中的高风险领域和通用设置，并且我们发现八个奖励模型在特定子类别上存在严重失败。为了缓解这些失败，我们提出了HARVE，一种针对标量奖励模型的无需训练的奖励头编辑方法。HARVE不是微调奖励模型，而是从与选定黑客子类别相关的残差流方向中识别出多方向黑客子空间，并移除与该子空间对齐的奖励头向量分量。这直接降低了奖励头对黑客相关特征的敏感性，仅使用少量对比性的黄金-黑客示例，无需梯度更新或微调。在八个奖励模型上的综合实验表明，该方法提高了黑客鲁棒性，优于微调基线，并保持了奖励模型的通用能力。进一步的分析表明，奖励黑客攻击更适合被捕捉为多维残差空间结构，而不是孤立的表面线索。

英文摘要

Reward models are central to large language model (LLM) alignment, but they remain vulnerable to reward hacking. To evaluate reward-model robustness, we introduce RewardHackBench containing 13 reward-hacking patterns covering real life high-stakes domains and general settings, and we find severe failures on specific subcategories across eight reward models. To mitigate these failures, we propose HARVE, a training-free reward-head editing method for scalar reward models. Instead of fine-tuning the reward model, HARVE identifies a multi-directional hacking subspace from residual stream directions associated with selected hacking subcategories, and removes the component of the reward-head vector aligned with that subspace. This directly reduces the reward head's sensitivity to hacking-related features using only a small set of contrastive gold-hacked examples, without gradient updates or fine-tuning. Comprehensive experiments across eight reward models indicates that \model improves hacking robustness, outperforms fine-tuning baselines, and preserves reward-models' general capability. Further analyses suggest that reward hacking is better captured as a multidimensional residual-space structure than by isolated surface cues.

URL PDF HTML ☆

赞 0 踩 0

2606.03130 2026-06-03 cs.LG

Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation

合成幻觉，真实收益：来自前沿模型的硬负样本用于FIM幻觉缓解

Mahdi Erfanian, Nelson Daniel Troncoso, Aashna Garg, Amabel Gale, Xiaoyu Liu, Pareesa Ameneh Golnari, Shengyu Fu

AI总结针对小型开源代码模型在IDE自动补全中产生的填充中间（FIM）幻觉问题，提出一种无需执行的替代方法：利用前沿代码模型合成看似合理但错误的补全作为硬负样本，通过对比合成幻觉与真实开发者编辑的差异作为监督微调信号，在Delulu基准上提升精确匹配18.8个百分点。

详情

AI中文摘要

驱动IDE自动补全的小型开源代码模型仍然会输出幻觉的填充中间（FIM）补全：对项目中不存在的方法、参数、变量和导入的语法上自然的调用。现有的缓解方法要么需要每种语言的执行沙箱（在按键中途不适用），要么需要偏好优化管道（需要大量人工标注语料库）。我们提出一种无需执行的替代方案：使用前沿代码模型合成看似合理但错误的补全作为硬负样本，然后利用这些合成幻觉与真实开发者编辑之间的对比作为监督微调信号。我们的管道从公共GitHub中跨八种语言抓取多语言FIM上下文，并让一组三个前沿生成器为每个上下文针对Delulu分类法（一个经Docker验证的多语言FIM幻觉基准）中的四种幻觉类型各生成一个硬负样本，从而产生配对的选定/拒绝数据集。在10万行精选子集上微调Qwen2.5-Coder-7B-Instruct，使Delulu精确匹配提升+18.8点，编辑相似度提升+0.22，覆盖每种语言和每种类型，同时改进每个HumanEval-Infilling分割和每个SAFIM子集。同样的配方在3B模型上使Delulu提升+12.8 EM，并带有小的、特征化的一般FIM权衡。五轴消融实验（规模、类型混合、语言覆盖、基础模型家族和难度感知的愚弄率）加上头对头的SFT与DPO/ORPO比较，映射了哪些设计选择驱动了收益。我们发布完整的管道源代码——生成、愚弄率LLM评判、筛选和FIM微调配方——以便本文中的实验可以在任何许可语料库上端到端复现。

英文摘要

Small open-source code models that power IDE autocomplete still emit hallucinated Fill-in-the-Middle (FIM) completions: syntactically natural calls to methods, parameters, variables, and imports that do not exist in the surrounding project. Existing mitigations either require per-language execution sandboxes that do not apply at mid-keystroke or preference-optimisation pipelines that need large human-labelled corpora. We propose an execution-free alternative: use frontier code models to synthesise plausible-but-wrong completions as hard negatives, then leverage the contrast between these synthetic hallucinations and the ground-truth developer edit as a supervised fine-tuning signal. Our pipeline scrapes multilingual FIM contexts from public GitHub across eight languages and asks a panel of three frontier generators to produce one hard negative per context for each of four hallucination types drawn from the Delulu taxonomy, a Docker-verified multilingual FIM hallucination benchmark, yielding a paired chosen/rejected dataset. Fine-tuning Qwen2.5-Coder-7B-Instruct on a 100K-row curated subset lifts Delulu exact match by +18.8 points and edit similarity by +0.22 on every language and every type, while also improving every HumanEval-Infilling split and every SAFIM subset. The same recipe at 3B lifts Delulu by +12.8 EM with a small, characterised general-FIM trade-off. Five-axis ablations (size, type mix, language coverage, base-model family, and a difficulty-aware fool rate) plus a head-to-head SFT vs. DPO/ORPO comparison map which design choices drive the gain. We release the full pipeline source code -- generation, fool-rate LLM judging, curation, and the FIM fine-tuning recipe -- so that the experiments in this paper can be reproduced end-to end on any permissively licensed corpus.

URL PDF HTML ☆

赞 0 踩 0

2606.03128 2026-06-03 cs.CR cs.AI cs.CL cs.LG

Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation

解耦式智能合约审计：通过蒸馏与聚合的轻量级LLM框架

Bagus Rakadyanto Oktavianto Putra, Muhamad Risqi Utama Saputra, Widyawan, Guntur Dharma Putra

AI总结提出一种基于轻量级开源LLM（0.6B-4B参数）的解耦式智能合约审计框架，通过rsLoRA、知识蒸馏和链式验证聚合策略，在漏洞检测中达到98.25%准确率，优于7B-34B参数模型。

详情

Comments: 12 pages, 4 figures, 5 tables. Accepted to IEEE ICWS 2026

AI中文摘要

智能合约面临关键安全挑战，需要在去中心化网络服务中进行彻底审计。虽然大型语言模型（LLMs）在自动漏洞检测中展现出潜力，但现有方法缺乏严重性评估和可操作的修复建议，且计算开销过大。在本研究中，我们引入了一个高效的端到端智能合约安全审计框架，利用轻量级、高度优化的开源LLMs（0.6B-4B参数）。我们的框架将综合审计任务解耦为四个相互关联的组件：漏洞检测、解释、严重性分类和修复建议。为了在无需庞大参数量的情况下保持高准确性，我们实现了秩稳定低秩适配器（rsLoRA）、知识蒸馏以及自定义链式验证（CoVe）聚合策略，系统性地筛选并整合模型生成的多个草稿响应，形成高准确度的审计报告。实验结果表明，我们的轻量级流水线持续优于最先进的开源代码密集LLMs（7B至34B参数），在漏洞检测中达到98.25%的准确率，在生成解释任务中达到0.4375的对齐分数。此外，我们广泛的消融研究实证验证了我们的解耦审计过程相对于统一提示的优越性，并揭示了一种新颖的严重性中心性偏差，为未来LLM辅助审计研究建立了关键基准。

英文摘要

Smart contracts face critical security challenges that require thorough auditing in decentralized web services. While Large Language Models (LLMs) have shown promise in automated vulnerability detection, existing approaches lack severity evaluations with actionable remediation and demand unnecessarily massive computational overhead. In this study, we introduce an efficient end-to-end smart contract security audit framework utilizing lightweight, highly optimized open-source LLMs (0.6B-4B parameters). Our framework decouples comprehensive audit tasks into four interconnected components: vulnerability detection, explanation, severity classification, and remediation recommendation. To maintain high accuracy without massive parameters, we implement Rank-Stabilized Low-Rank Adapters (rsLoRA), knowledge distillation, and a custom Chain-of-Verification (CoVe) aggregation strategy to systematically screen and consolidate multiple draft responses from the model into a highly accurate audit report. Experimental results demonstrate that our lightweight pipeline consistently outperforms state-of-the-art open-source coder dense LLMs (7B to 34B parameters), achieving 98.25% accuracy in vulnerability detection and an alignment score of 0.4375 in generative explanation tasks. Furthermore, our extensive ablation studies empirically validate the superiority of our decoupled audit processes over unified prompting and uncover a novel severity centrality bias, establishing a critical benchmark for future research in LLM-assisted auditing.

URL PDF HTML ☆

赞 0 踩 0

2606.03127 2026-06-03 cs.RO

TTT-VLA: Test-Time Latent Prompt Optimization for Vision-Language-Action Models

TTT-VLA：面向视觉-语言-动作模型的测试时潜在提示优化

Wenbo Zhang, Jianxiong Li, Shuai Yang, Sijin Chen, Jiajun Liu, Lingqiao Liu, Xiao Ma

AI总结提出TTT-VLA框架，通过测试时优化潜在提示来适应分布偏移，无需修改策略本身，在SimperEnv上提升单/多实体任务成功率。

详情

AI中文摘要

基于大规模数据训练的视觉-语言-动作（VLA）模型取得了显著进展，但在部署时仍易受分布偏移影响。最近的VLA模型表明，提示可以作为引导策略行为的有效接口，但现有的基于提示的引导通常依赖外部指导。这自然引出一个问题：能否通过优化提示来实现VLA的测试时训练（TTT），使得引导接口本身可以从交互中学习和适应？我们通过TTT-VLA来解决这个问题，这是一种基于潜在提示优化（LPO）的测试时训练框架。在训练期间，潜在提示通过额外的代理任务学习，为策略学习提供额外的学习条件信号。在测试时，通过从当前环境收集交互数据，并仅使用代理任务的自监督信号优化这些数据上的潜在提示来执行TTT，而不修改策略本身。在SimperEnv上的实验表明，所提方法在单实体和多实体设置中均能持续提高任务成功率。进一步分析表明，提升主要源于纠正少量关键决策，而非全局改变策略行为。这些结果表明，LPO为基础操作策略的部署时改进提供了一条有效且实用的途径。

英文摘要

Vision-Language-Action (VLA) models trained on large-scale data have made remarkable progress, but they remain vulnerable to distribution shifts at deployment time. Recent VLA models suggest that prompts can serve as an efficient interface for steering policy behavior, but existing prompt-based steering typically relies on external guidance. This raises a natural question: can test-time training (TTT) for VLA be achieved by optimizing a prompt, so that the steering interface itself can be learned and adapted from interaction? We address this question with TTT-VLA, a test-time training framework based on Latent Prompt Optimization (LPO). During training, the latent prompt is learned with an additional proxy task, providing an extra learned conditioning signal for policy learning. At test time, TTT is performed by collecting interaction data from the current environment and optimizing only the latent prompt on those data using the proxy task's self-supervised signal, without modifying the policy itself. Experiments on SimplerEnv demonstrate that the proposed method consistently improves task success rates in both single- and multi-embodiment settings. Further analysis shows that the gains arise primarily from correcting a small number of critical decisions rather than globally altering policy behavior. These results suggest that LPO provides an effective and practical pathway for deployment-time improvement of foundation manipulation policies.

URL PDF HTML ☆

赞 0 踩 0

2606.03125 2026-06-03 cs.LG

Rethinking Neural Width for Alternating Current Optimal Power Flow Proxies

重新思考用于交流最优潮流代理的神经网络宽度

Dhruvi Khandelwal, Anurag Basistha, Ayushi Jolotia, Parikshit Pareek

AI总结本文提出损失引导神经稠密化算法，通过逐步扩展网络容量来最小化宽度，以精确逼近交流最优潮流流形，并在多个IEEE系统上以少十倍的神经元达到与基线相当的性能。

2606.03121 2026-06-03 cs.LG

TiWeaver: Unified Temporal Dynamics Modeling via Contextual Patching

TiWeaver：通过上下文补丁实现统一的时间动态建模

Zhe Li, Jindong Tian, Hao Miao, Zhi Lei, Chenjuan Guo, Bin Yang

AI总结针对多变量时间序列中因缺失值和非均匀采样等不规则性导致的动态复杂性和通道间异步依赖问题，提出TiWeaver框架，通过图引导自适应分词器（G²AT）和细粒度异步依赖提取器（FADE）实现自适应建模，在12个数据集上取得最高25%的性能提升。

详情

DOI: 10.1145/3770855.3817748

AI中文摘要

多变量时间序列预测在现实世界应用中扮演着关键角色，包括天气预报、股票分析和健康监测。由于数据源的多样性，时间序列表现出多样的时间动态，通常伴随着各种不规则性，如缺失值和非均匀采样频率。这些不规则性导致跨通道的复杂异步时间依赖。因此，具有固定补丁方案的单一模型往往难以很好地适应多样化的多变量时间序列，阻碍了准确预测。在本文中，我们提出了TiWeaver，一个统一框架，旨在自适应地处理时间动态和细粒度的通道间依赖。具体来说，我们引入了一个图引导自适应分词器（G²AT），通过联合考虑时间密度和表示一致性，将时间序列划分为高度上下文连贯的补丁。此外，我们提出了一个细粒度异步依赖提取器（FADE），旨在建模细粒度的异步通道间依赖，同时结合长期历史依赖。我们在12个真实世界时间序列数据集上评估了TiWeaver，它取得了最先进的性能，优于现有方法高达25%。这些结果证明了其在多样化领域和数据特征上的鲁棒性和有效性。

英文摘要

Multivariate time series forecasting plays a critical role in real-world applications, including weather prediction, stock analysis, and health monitoring. Due to the diversity of data sources, time series exhibit diverse temporal dynamics, often accompanied by various irregularities such as missing values and non-uniform sampling frequencies. Such irregularities lead to complex and asynchronous temporal dependencies across channels. Thus, a single model with a fixed patching scheme often fails to adapt well to diverse multivariate time series, hindering accurate forecasting. In this paper, we propose TiWeaver, a unified framework designed to handle temporal dynamics and fine-grained inter-channel dependencies adaptively. Specifically, we introduce a Graph-Guided Adaptive Tokenizer (G$^2$AT) that divides time series into high contextually coherent patches by jointly considering temporal density and representation consistency. In addition, we propose a Fine-grained Asynchronous Dependency Extractor (FADE), which is designed to model fine-grained asynchronous inter-channel dependencies while incorporating long-term historical dependencies. We evaluate TiWeaver on 12 real-world time series datasets, where it achieves state-of-the-art performance, outperforming existing methods up to 25%. These results demonstrate its robustness and effectiveness across diverse domains and data characteristics.

URL PDF HTML ☆

赞 0 踩 0

2606.03120 2026-06-03 cs.CV

KC-3DGS: Kurtosis-Constrained Gaussian Splatting for High-Fidelity View Synthesis

KC-3DGS: 基于峰度约束的高斯泼溅用于高保真视图合成

Vivekjyoti Banerjee, Abhay Yadav, Rama Chellappa, Aniket Roy

AI总结提出KC-3DGS，通过在小波域添加多尺度对齐损失、峰度集中损失和跨频带协方差惩罚，增强3DGS的感知质量，尤其改善稀疏视图下的高频细节和结构伪影。

详情

AI中文摘要

3D高斯泼溅（3DGS）通过将场景表示为各向异性高斯集合，并通过可微分光栅化优化，实现了实时新视图合成。然而，标准像素空间损失（L1、SSIM）仅约束整体重建误差，允许优化在频率尺度上重新分配误差。这导致过度平滑和结构伪影，尤其在监督有限的稀疏视图设置中。我们提出KC-3DGS，通过基于自然图像统计的小波域监督来增强3DGS训练。我们的方法结合了三个组件：（1）多尺度小波系数对齐损失，显式惩罚缺失的高频细节；（2）有监督的峰度集中损失，鼓励渲染图像匹配真实图像的重尾频率统计；（3）跨频带协方差惩罚，促进频率专门化。我们提供理论分析，表明像素空间损失允许在小波重分布下的一族不可区分扰动，而我们的联合目标排除了退化解。在MipNeRF360、Tanks&Temples、MVImgNet、DeepBlending和WRIVA-ULTRRA上的实验表明，感知质量持续提升。在具有挑战性的WRIVA-ULTRRA室外数据集上，KC-3DGS在DreamSim上提高了9.48%，同时改善了PSNR、SSIM和LPIPS。在仅有12张训练图像的稀疏视图设置中，我们的方法在MipNeRF360上将PSNR提高了高达0.5 dB，同时保持了感知质量。该方法作为即插即用的正则化策略，可无缝集成到现有的3DGS流程中。

英文摘要

3D Gaussian Splatting (3DGS) enables real-time novel view synthesis by representing scenes as collections of anisotropic Gaussians optimized via differentiable rasterization. However, standard pixel-space losses (L1, SSIM) constrain only aggregate reconstruction error, permitting the optimization to redistribute error across frequency scales. This leads to oversmoothing and structural artifacts, particularly in sparse-view settings where supervision is limited. We propose KC-3DGS, which augments 3DGS training with wavelet-domain supervision based on natural image statistics. Our method combines three components: (1) a multi-scale wavelet coefficient alignment loss that explicitly penalizes missing high-frequency detail, (2) a supervised kurtosis concentration loss that encourages rendered images to match the heavy-tailed frequency statistics of ground-truth images, and (3) a cross-band covariance penalty that promotes frequency specialization. We provide theoretical analysis showing that pixel-space losses admit a family of indistinguishable perturbations under wavelet redistribution, and that our joint objective excludes degenerate solutions. Experiments across MipNeRF360, Tanks&Temples, MVImgNet, DeepBlending, and WRIVA-ULTRRA demonstrate consistent improvements in perceptual quality. On the challenging WRIVA-ULTRRA outdoor dataset, KC-3DGS achieves a 9.48% improvement in DreamSim while also improving PSNR, SSIM, and LPIPS. In sparse-view settings with only 12 training images, our method improves PSNR by up to 0.5 dB on MipNeRF360 while maintaining perceptual quality. The approach integrates seamlessly into existing 3DGS pipelines as a plug-and-play regularization strategy.

URL PDF HTML ☆

赞 0 踩 0

2606.03119 2026-06-03 cs.CV cs.AI cs.LG

GuidedBridge: Training-freely Improving Bridge Models with Prior Guidance

GuidedBridge: 无需训练地利用先验引导改进桥接模型

Zehua Chen, Yucheng Yang, Binjie Yuan, Kaiwen Zheng, Jun S. Liu, Jun Zhu

AI总结提出无需训练的先验引导方法（PG）和频率调制先验引导（FMPG），通过对比弱先验与已见先验增强桥接模型的先验利用，并设计级联框架CFG-FMPG用于图像修复，实验证明该方法能一致提升预训练桥接模型在多种图像翻译任务中的性能。

详情

Comments: ICML 2026

AI中文摘要

引导方法，如无分类器引导（CFG）和自动引导（AG），推动了扩散模型中噪声到数据生成的发展。最近，桥接模型引入了一种数据到数据的生成过程，可以利用有指导性的干净先验。在这项工作中，受先前通过去噪结果质量差异作为引导的方法启发，我们提出了一种无需训练的桥接引导方法，称为先验引导（PG）。具体来说，我们引入一个弱先验，该先验在桥接预训练期间未见，阻碍先验利用从而降低去噪结果。然后，我们将其与已见先验对比，通过缩放因子突出并增强先验利用。此外，我们分析了桥接过程中先验利用的潜在机制，并设计了频率调制先验引导（FMPG），该引导将引导尺度调整到与桥接生成动力学一致的低频和高频带。为了解决图像修复中的先验利用问题，我们开发了一个级联框架CFG-FMPG，该框架首先通过CFG生成噪声隐藏表示，然后将其作为生成先验与FMPG一起利用，在不影响推理效率的情况下发挥它们的互补优势。实验表明，我们的PG方法在多种图像翻译任务中一致地改进了预训练桥接模型。

英文摘要

Guidance methods, such as classifier-free guidance (CFG) and auto-guidance (AG), have advanced noise-to-data generation in diffusion models. Recently, bridge models have introduced a data-to-data generative process that can exploit an instructive clean prior. In this work, inspired by previous methods creating quality difference between denoising results as guidance, we propose a training-free bridge guidance method, termed Prior Guidance (PG). Specifically, we introduce a weak prior, which is unseen during bridge pre-training, hindering prior exploitation and thereby degrading denoising result. Then, we contrast it with the seen prior to highlight and enhance prior exploitation via a scaling factor. Moreover, we analyze the underlying mechanism of prior exploitation in the bridge process and design frequency-modulated prior guidance (FMPG), which tailors the guidance scale to low- and high-frequency bands coherent with bridge generative dynamics. To address prior exploitation in image in-painting, we develop a cascaded framework, CFG-FMPG, which first generates a noisy hidden representation via CFG and then exploits it as a generative prior with FMPG, fulfilling their complementary strengths without compromising inference efficiency. Experiments demonstrate that our PG methods consistently improve pre-trained bridge models across diverse image translation tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.03118 2026-06-03 cs.LG cs.CV q-bio.NC

Learning to See via Epiretinal Implant Stimulation in silico with Model-Based Deep Reinforcement Learning

通过基于模型的深度强化学习在硅上学习经由视网膜上植入物刺激的视觉

Jacob Lavoie, Marwan Besrour, William Lemaire, Jean Rouat, Réjean Fontaine, Eric Plourde

AI总结本研究提出使用各向同性和各向异性形状，通过深度强化学习在虚拟患者的视网膜上渲染可理解的图像，以提高人工恢复视觉的清晰度。

详情

DOI: 10.1088/2057-1976/acf1a5
Journal ref: Biomed. Phys. Eng. Express 10 (2024) 025006
Comments: 18 pages, 6 figures. Published version: Biomed. Phys. Eng. Express 10, 025006 (2024)

AI中文摘要

目标：年龄相关性黄斑变性和视网膜色素变性等疾病会导致感光层退化。恢复视力的一种方法是通过微电极阵列（如视网膜上植入物）电刺激存活的视网膜神经节细胞。已知视网膜上植入物会产生沿邻近视网膜神经节细胞轴突束延伸的可见各向异性形状。最近的研究表明，为了获得各向同性的像素状形状，可以通过失活电极或降低刺激电流水平来映射轴突束并避免刺激它们。避免轴突束刺激旨在去除类似笔触的形状，转而采用更简化的像素状形状集合。方法：在本研究中，我们提出使用各向同性和各向异性形状，在名为rlretina的强化学习环境中为虚拟患者的视网膜渲染可理解的图像。该环境将任务形式化为在基于笔触的渲染任务中使用笔触。主要结果：我们训练了一个深度强化学习智能体，它学会组合各向同性和各向异性形状以形成图像。我们研究了哪种基于误差或基于感知的指标适合奖励智能体。该智能体以基于模型的数据生成方式训练，使用经过心理物理学验证的轴突映射模型来渲染不同虚拟患者感知到的图像。我们表明，与不同虚拟患者中的朴素方法相比，该智能体可以生成更可理解的图像。意义：这项工作提供了一种解决视网膜上刺激的新方法，这是朝着使用各向异性光幻视改善人工恢复视力中视觉敏锐度的第一步。

英文摘要

Objective: Diseases such as age-related macular degeneration and retinitis pigmentosa cause the degradation of the photoreceptor layer. One approach to restore vision is to electrically stimulate the surviving retinal ganglion cells with a microelectrode array such as epiretinal implants. Epiretinal implants are known to generate visible anisotropic shapes elongated along the axon fascicles of neighboring retinal ganglion cells. Recent work has demonstrated that to obtain isotropic pixel-like shapes, it is possible to map axon fascicles and avoid stimulating them by inactivating electrodes or lowering stimulation current levels. Avoiding axon fascicle stimulation aims to remove brushstroke-like shapes in favor of a more reduced set of pixel-like shapes. Approach: In this study, we propose the use of isotropic and anisotropic shapes to render intelligible images on the retina of a virtual patient in a reinforcement learning environment named rlretina. The environment formalizes the task as using brushstrokes in a stroke-based rendering task. Main Results: We train a deep reinforcement learning agent that learns to assemble isotropic and anisotropic shapes to form an image. We investigate which error-based or perception-based metrics is adequate to reward the agent. The agent is trained in a model-based data generation fashion using the psychophysically validated axon map model to render images as perceived by different virtual patients. We show that the agent can generate more intelligible images compared to the naive method in different virtual patients. Significance: This work shares a new way to address epiretinal stimulation that constitutes a first step towards improving visual acuity in artificially-restored vision using anisotropic phosphenes.

URL PDF HTML ☆

赞 0 踩 0

2606.03114 2026-06-03 cs.CV

FAF-CD: Frequency-Aware Fusion for Change Detection under Imperfect Multimodal Remote Sensing

FAF-CD: 面向不完美多模态遥感的频率感知融合变化检测

Yufan Wang, Sokratis Makrogiannis, Chandra Kambhamettu

AI总结提出频率感知混合框架FAF-CD，通过DINOv3预训练ConvNeXt编码器、VMamba解码器及修正感知三支融合模块（可变形空间对齐+傅里叶/哈尔小波比较+自适应门控），在不完美异质遥感（如EO-SAR）和二元光学变化检测中提升精度并降低计算成本。

详情

Comments: Code will be released at https://github.com/VimsLab/FAF-CD

AI中文摘要

分解提示如何引导行为

Fan L. Cheng, Nikolaus Kriegeskorte

AI总结提出嵌套几何分解框架，通过刺激不变映射分析提示如何重塑表示几何，揭示跨维度线性混合是提示引导行为的关键机制。

详情

Comments: 59 pages, 41 figures

AI中文摘要

提示引导大型语言模型（LLMs）和视觉语言模型（VLMs）无需权重更新，但指令变化如何重塑内部表示以产生行为仍不清楚。我们引入了一个嵌套几何分解框架，将提示视为对提示后内容表示几何的变换。对于每个提示对，我们使用越来越具表达力的刺激不变映射（平移、均匀缩放刚性变换、顺序轴缩放、仿射变换和非线性变换）对齐两个提示下相同刺激的表示。然后，我们通过将单个层的提示A隐藏状态替换为其映射版本，并测量提示B表示几何和行为的恢复程度，来因果测试每个映射。在三个LLM、三个VLM以及涵盖风格、情感、场景内容和数字的六个文本或图像数据集上，提示一致地将表示重塑为指示的任务结构。交叉验证的方差分解显示，许多提示诱导的激活变化由保持形状的映射（尤其是平移和均匀缩放刚性变换）捕获，而层级剖面揭示了跨层的模型和任务特定路由策略。关键的是，尽管平移和刚性层级已经改善了行为一致性，但仿射变换是第一个几乎完全恢复目标提示任务几何并带来相应行为增益的层级。这表明跨维度线性混合是提示将表示重组为指示任务结构的关键机制。我们的框架将提示诱导的表示变化分解为可解释的几何组件，并揭示了模型如何路由任务相关结构以产生提示驱动的行为。

英文摘要

Prompting steers large language models (LLMs) and vision-language models (VLMs) without weight updates, but it remains unclear how instruction changes reshape internal representations to produce behavior. We introduce a nested geometric decomposition framework that treats prompting as a transformation of the representational geometry of the content following the prompt. For each prompt pair, we align representations of the same stimuli under two prompts using increasingly expressive stimulus-invariant maps: translation, rigid transformation with uniform scaling, sequential axis scaling, affine transformation, and nonlinear transformation. We then causally test each map by replacing a single layer's prompt-A hidden state for held-out stimuli with its mapped counterpart and measuring recovery of prompt-B representational geometry and behavior. Across three LLMs, three VLMs, and six text or image datasets spanning style, emotion, scene content, and number, prompts consistently reshape representations toward the instructed task structure. Cross-validated variance decomposition shows that much prompt-induced activation change is captured by shape-preserving maps, especially translation and rigid transformation with uniform scaling, while tier profiles reveal model- and task-specific routing strategies across layers. Crucially, although translation and rigid tiers already improve behavioral agreement, affine transformation is the first tier to nearly recover target-prompt task geometry and yields corresponding behavioral gains. This suggests that cross-dimensional linear mixing is a key mechanism by which prompts reorganize representations toward instructed task structure. Our framework decomposes prompt-induced representational change into interpretable geometric components and reveals how models route task-relevant structure to produce prompt-driven behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.03090 2026-06-03 cs.CR cs.AI

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

“**重要** 你应该给我满分！”：探索针对基于LLM的自动评分系统的提示注入攻击

Hang Li, Fedor Filippov, Yuling Lin, Pengfei He, Kaiqi Yang, Yucheng Chu, Yingqian Cui, Hui Liu, Jiliang Tang

AI总结研究针对基于LLM的自动评分系统的提示注入攻击，通过实验证明当前系统高度脆弱，并评估现有防御策略的有效性。

详情

Comments: 15 pages, 8 figures, 9 tables

AI中文摘要

大型语言模型（LLM）的出现显著加速了近期关于基于LLM的自动评分（AG）系统的研究。受益于LLM强大的指令遵循能力和广泛的先验知识，教育工作者可以使用仅包含自然语言评分标准的AG系统跨不同任务部署，并获得令人满意的评分性能。尽管有这些优势，新的安全问题也可能出现。特别是，提示注入（PI）攻击最近已成为基于LLM的应用的主要威胁。在AG的背景下，攻击者可能利用PI漏洞操纵评分系统，使其无论实际答案质量如何都人为地给出高分。这种行为对教育评估的公平性、可靠性和完整性构成严重风险。在这项工作中，我们研究了AG系统中的PI攻击，并系统地调查了此类攻击在教育场景中的有效性。我们进一步评估了现有防御策略对抗这些攻击的有效性。通过在基于评分标准的评分设置下进行全面的实验，我们证明了当前基于LLM的AG系统仍然高度容易受到PI攻击。我们希望我们的发现能提高对这种新兴威胁的认识，并激励未来研究朝着安全、稳健和可信的基于LLM的教育系统发展。

英文摘要

The emergence of large language models (LLMs) has significantly accelerated recent research on LLM-based automatic grading (AG) systems. Benefiting from the strong instruction-following capabilities and broad prior knowledge of LLMs, educators can deploy AG systems across diverse tasks using only natural language rubrics while achieving satisfactory grading performance. Despite these advantages, new security concerns may also arise. In particular, prompt injection (PI) attacks have recently become a major threat to LLM-based applications. In the context of AG, attackers can potentially exploit PI vulnerabilities to manipulate grading systems into assigning artificially high scores regardless of the actual answer quality. Such behavior poses serious risks to the fairness, reliability, and integrity of educational assessment. In this work, we study PI attacks in AG systems, and systematically investigate the effectiveness of such attacks in educational scenarios. We further evaluate the effectiveness of existing defensive strategies against these attacks. Through comprehensive experiments under rubric-based grading settings, we demonstrate that current LLM-based AG systems remain highly vulnerable to PI attacks. We hope that our findings raise awareness of this emerging threat and motivate future research toward secure, robust, and trustworthy LLM-based educational systems.

URL PDF HTML ☆

赞 0 踩 0

2606.03089 2026-06-03 cs.LG cs.AI

Constitutional On-Policy Safe Distillation

宪法性在策略安全蒸馏

Ming Wen, Yuxuan Liu, Kun Yang, Yunhao Feng, Zhuoer Xu, Yuhao Sun, Shiwen Cui, Xiang Zheng, Xingjun Ma, Yu-Gang Jiang

AI总结针对在策略自蒸馏在安全对齐中因宪法条件导致教师分布收缩、表达能力下降的问题，提出宪法性在策略安全蒸馏（COPSD），通过交叉SFT冷启动校准教师分布，再进行宪法条件在策略蒸馏，在12个基准上实现了更优的安全-有用性权衡并降低安全税。

详情

AI中文摘要

在策略自蒸馏（OPSD）通过使用基于特权信息条件的教师提供密集的令牌级监督，已成为一种高效的后训练范式。先前工作表明，OPSD在可验证推理任务中可能崩溃，但安全对齐不同，它由高层宪法而非显式目标答案指导，因此是重新审视密集蒸馏的自然场景。然而，我们的初步研究表明，安全OPSD仍然遭受严重崩溃：宪法条件将教师分布收缩为短且过于保守的响应，而反向KL进一步将这种收缩放大为表达能力下降。我们将此效应形式化为非正交语义空间中安全边界下的几何泄漏，其中安全压力转移到表达能力维度。基于此分析，我们提出宪法性在策略安全蒸馏（COPSD），首先通过交叉SFT冷启动校准教师，然后执行宪法条件在策略蒸馏。在12个基准上的实验表明，COPSD比基线实现了持续更强的安全-有用性权衡，同时大幅降低了对通用推理能力的安全税。

英文摘要

On-policy self-distillation (OPSD) has emerged as an efficient post-training paradigm by using a teacher conditioned on privileged information to provide dense token-level supervision. Prior work has shown that OPSD can collapse in verifiable reasoning tasks, but safety alignment differs in that it is guided by high-level constitutions rather than explicit target answers, making it a natural setting to revisit dense distillation. However, our pilot study show that safety OPSD still suffers from severe collapse: constitutional conditioning contracts the teacher distribution toward short and overly conservative responses, and Reverse KL further amplifies this contraction into reduced expressiveness. We formalize this effect as geometric leakage under safety boundaries in a non-orthogonal semantic space, where safety pressure transfers into the expressiveness dimension. Based on this analysis, we propose Constitutional On-Policy Safe Distillation (COPSD), which first calibrates the teacher through a Cross-SFT cold-start and then performs constitution-conditioned on-policy distillation. Experiments on 12 benchmarks show that COPSD achieves a consistently stronger safety--helpfulness trade-off than baselines while substantially reducing the safety tax on general reasoning ability.

URL PDF HTML ☆

赞 0 踩 0

2606.03087 2026-06-03 cs.LG

Learning to Solve, Forgetting to Retain: Correct-Set Turnover in RLVR

学会解决，忘记保留：RLVR中的正确集更替

Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Peng Fu, Zheng Lin

AI总结针对强化学习可验证奖励（RLVR）中模型遗忘已解决问题的问题，提出正确集更替现象和修复窗口原则，并设计保留感知的回顾机制\method{}，通过零额外开销的预部署批量替换提升多模态任务性能。

详情

AI中文摘要

强化学习可验证奖励（RLVR）提升了大型语言模型的能力，然而头条准确率的提升往往掩盖了一个隐藏代价：随着训练进行，先前解决的问题悄然变得无法解决。我们将此现象定义为\emph{正确集更替}，代表了在已掌握集上解决方案获取与退化的耦合动态。在此视角下，保留与获取一样成为明确的优化目标。我们分析并实证建立了\emph{修复窗口原则}：恢复退化提示的成本随回顾延迟急剧增加，定义了一个标准RLVR流程未能利用的低成本窗口。为解决此问题，我们提出\method{}，一种保留感知的回顾机制，追踪已掌握提示并定期重新引入以\emph{提醒}模型先前的解决方案。通过利用预部署批量替换，\method{}引入零额外部署开销。在涵盖图像-文本、视频和纯文本任务的20个基准上，使用Qwen3-VL和Qwen2.5-Math进行评估，\method{}在GRPO、DAPO和回放基线上持续提升性能，展示了跨模态和算法的稳健泛化能力。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) improves the ability of large language model, yet headline accuracy gains often conceal a hidden cost: previously solved problems quietly become unsolvable as training proceeds. We frame this phenomenon as \emph{correct-set turnover}, representing the coupled dynamics of solution acquisition and regression over the mastered set. Under this view, retention becomes an explicit optimization target alongside acquisition. We analytically and empirically establish the \emph{repair-window principle}: the cost of restoring a regressed prompt grows sharply with review delay, defining a low-cost window that standard RLVR pipelines fail to exploit. To address this, we propose \textbf{\method{}}, a retention-aware review mechanism that tracks mastered prompts and periodically reintroduces them to \textbf{remind} the model of previous solutions. By utilizing pre-rollout batch replacement, \method{} incurs zero additional rollout overhead. Evaluated across 20 benchmarks spanning image-text, video, and text-only tasks with Qwen3-VL and Qwen2.5-Math, \method{} consistently improves performance over GRPO, DAPO, and replay baselines, demonstrating robust generalizability across modalities and algorithms.

URL PDF HTML ☆

赞 0 踩 0

2606.03085 2026-06-03 cs.LG cs.CL

Multi-component Causal Tracing in Large Language Models

大型语言模型中的多组件因果追踪

Zirui Yan, Dennis Wei, Dmitriy A. Katz, Prasanna Sattigeri, Ali Tajer

AI总结本文提出一个统一框架，通过软干预和度量转换高效识别对目标性能指标最关键的多组件子集，优于现有基线方法。

详情

Comments: Accepted to ACL 2026 main conference

AI中文摘要

因果追踪通过系统地干预大型语言模型（LLM）的内部表示，揭示并量化将特定输入或计算与特定感兴趣指标联系起来的因果路径，从而量化LLM的行为。在先前单组件或单层研究的基础上，本文提出了一个同时因果追踪多个组件的统一框架。该框架系统地识别对期望目标性能指标（如准确性和公平性）最关键的组件子集（例如注意力头和多层感知器神经元）。这是通过将灵活的干预应用于广泛期望的指标来实现的。为了解决多组件问题的组合复杂性，设计了一种高效算法，该算法利用软干预和精心设计的度量转换，将组合搜索问题转化为一个连续问题，该问题可以在适当约束下高效求解，从而为选择组件生成适当的二元决策。实验结果表明，所提出的方法高效地识别出对目标指标具有高影响力的模型组件子集，优于现有基线方法。我们的代码可从此https URL获取。

英文摘要

Causal tracing systematically intervenes on a large language model's (LLM's) internal representations to uncover and quantify the causal pathways linking specific inputs or computations to specific metrics of interest, quantifying the LLM's behavior. Building on previous single-component or single-layer studies, this paper presents a unified framework for causally tracing multiple components simultaneously. This framework systematically identifies the subsets of components (e.g., attention heads and multi-layer perceptron neurons) most critical to a desired target performance metric (e.g., accuracy and fairness). This is achieved by incorporating flexible interventions applied to a wide range of desired metrics. To address the combinatorial complexity of the multi-component problem, an efficient algorithm is designed that leverages soft interventions and a carefully designed metric transformation, converting the combinatorial search problem into a continuous one that can be solved efficiently under proper constraints, thereby generating proper binary decisions for selecting components. Experimental results demonstrate that the proposed method efficiently identifies subsets of the model's components that have a high impact on the target metric, outperforming existing baseline approaches. Our code is available at https://github.com/ZiruiYan/multi-component-causal-tracing.

URL PDF HTML ☆

赞 0 踩 0

2606.03084 2026-06-03 cs.CV

Hierarchical Federated Learning with Dynamic Clustering and Adaptive Regularization for Robust Infrastructure Inspection

面向鲁棒基础设施检测的动态聚类与自适应正则化分层联邦学习

Yuhu Feng, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

AI总结提出一种分层联邦学习框架，通过宏观动态梯度聚类和微观自适应正则化解决基础设施检测中数据异构问题，实现鲁棒且特化的诊断模型。

详情

AI中文摘要

由于严格的隐私和安全法规，数据驱动计算机视觉模型在结构健康监测（SHM）中的应用受到数据孤岛困境的严重制约。虽然联邦学习（FL）提供了一种保护隐私的协作替代方案，但其在全国性基础设施网络中的应用受到“双重异构性”挑战的严重阻碍：不同结构类型之间的宏观物理差异以及本地数据集内的微观统计不平衡。为了克服这一挑战，本文提出了一种新颖的分层联邦学习框架。该框架协调了一种协同的两层优化策略。在宏观层面，一种基于动态梯度的聚类机制根据客户的结构退化轨迹自动将分布式客户聚合成专门的专家组，无需先验地理元数据。同时，在微观层面，一种簇内动态区域自适应近端正则化（DRAPR）模块为每个客户端计算实时统计的非独立同分布强度分数。通过基于局部标签偏斜和梯度发散自适应调整近端惩罚，DRAPR有效校准局部更新，减轻客户端漂移，并防止少数损伤类别的灾难性遗忘。在大型真实世界结构检测数据集上的综合评估表明，宏观聚类与微观正则化的分层集成成功中和了双层异构性，为复杂基础设施检测生成了高度鲁棒且特化的诊断模型。

英文摘要

The deployment of data-driven computer vision models for structural health monitoring (SHM) is heavily constrained by the data silo dilemma due to stringent privacy and security regulations. While federated learning (FL) offers a privacy-preserving collaborative alternative, its application to nationwide infrastructure networks is severely hindered by the challenge of ``double heterogeneity'': macro-level physical divergence across disparate structural types and micro-level statistical imbalances within local datasets. To overcome this challenge, this paper proposes a novel hierarchical federated learning framework. The framework orchestrates a synergistic two-tier optimization strategy. At the macro-level, a dynamic gradient-based clustering mechanism autonomously aggregates distributed clients into specialized expert groups based on their structural degradation trajectories, circumventing the need for prior geographical metadata. Concurrently, at the micro-level, an intra-cluster Dynamic Region-Adaptive Proximal Regularization (DRAPR) module computes a real-time statistical Non-IID Intensity Score for each client. By adaptively modulating a proximal penalty based on local label skewness and gradient divergence, DRAPR effectively calibrates local updates, mitigates client drift, and prevents the catastrophic forgetting of minority damage classes. Comprehensive evaluations on a large-scale, real-world structural inspection dataset demonstrate that the hierarchical integration of macro-clustering and micro-regularization successfully neutralizes dual-level heterogeneity, yielding highly robust and specialized diagnostic models for complex infrastructure inspection.

URL PDF HTML ☆

赞 0 踩 0

Disentangling Visual and Factual Correctness in LVLMs' Visualization Literacy

Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation

PsychoPass: Geometric Profiling of Multi-Turn Adversarial LLM Conversations

Uncertainty-Aware Clarification in LLM Agents with Information Gain

How Visible Are Silent Manipulation Failures? An Observability Study of False-Success Detection in Simulated Robot Episodes

DMT-CBT: Longitudinal Therapeutic State Modeling for CBT Counseling

HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models

Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation

Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation

TTT-VLA: Test-Time Latent Prompt Optimization for Vision-Language-Action Models

Rethinking Neural Width for Alternating Current Optimal Power Flow Proxies

TiWeaver: Unified Temporal Dynamics Modeling via Contextual Patching

KC-3DGS: Kurtosis-Constrained Gaussian Splatting for High-Fidelity View Synthesis

GuidedBridge: Training-freely Improving Bridge Models with Prior Guidance

Learning to See via Epiretinal Implant Stimulation in silico with Model-Based Deep Reinforcement Learning

FAF-CD: Frequency-Aware Fusion for Change Detection under Imperfect Multimodal Remote Sensing

Experience-Driven Dynamic Exits for LLMs with Reinforcement Learning

Inverting the Generation Process of Denoising Diffusion Implicit Models: Empirical Evaluation and a Novel Method

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration

Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

PhotoCraft: Agentic Reasoning with Hierarchical Self-Evolving Memory for Deep Image Search

From Long News to Accurate Forecast: Importance-Aware Fusion and PRM-Guided Reflection for Time Series Forecasting

FGRPO: Federated GRPO with Adaptive Aggregation on Non-IID Data

Decomposing how prompting steers behavior

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Constitutional On-Policy Safe Distillation

Learning to Solve, Forgetting to Retain: Correct-Set Turnover in RLVR

Multi-component Causal Tracing in Large Language Models

Hierarchical Federated Learning with Dynamic Clustering and Adaptive Regularization for Robust Infrastructure Inspection

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems