arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.02580 2026-06-02 cs.CV

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

在Blender中思考：基于视觉语言模型的分阶段可执行逆向图形

Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor

AI总结提出分阶段可执行逆向图形（SEIG）框架，利用预训练视觉语言模型直接从单张图像重建可编辑的Blender程序，无需专用基础模型或可微渲染，通过逐步细化几何、材质、组合和光照提升重建保真度。

详情

AI中文摘要

逆向图形是一个长期存在且高度欠约束的问题，旨在将图像重建为可编辑的3D场景，这些场景可以渲染、重新照明和操作。在这项工作中，我们研究了预训练的视觉语言模型（VLM）是否可以直接从单张图像执行可执行逆向图形，通过将场景重建为可编辑的Blender程序，而不依赖于专门的2D或3D基础模型、可微渲染或多视图监督。我们引入了分阶段可执行逆向图形（SEIG），这是一个智能体框架，通过直接在可执行的Blender代码空间中逐步细化场景因素（包括几何、材质、组合和照明），从单张图像重建3D场景。我们使用一系列重建指标（涵盖像素级、感知和语义保真度）在各种场景上评估我们的框架。我们的实验表明，分阶段重建显著提高了重建保真度，突出了任务分解对于使用通用VLM进行可执行逆向图形的重要性。最后，我们展示了由重建的可编辑Blender场景启用的各种下游应用。

英文摘要

Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and manipulated. In this work, we investigate whether pretrained vision-language models (VLMs) can perform executable inverse graphics directly from a single image by reconstructing a scene as an editable Blender program, without relying on specialized 2D or 3D foundation models, differentiable rendering, or multi-view supervision. We introduce Staged Executable Inverse Graphics (SEIG), an agentic framework that reconstructs a 3D scene from a single image by progressively refining scene factors including geometry, materials, composition, and lighting directly in executable Blender code space. We evaluate our framework across diverse scenes using a range of reconstruction metrics spanning pixel-level, perceptual, and semantic fidelity. Our experiments show that staged reconstruction substantially improves reconstruction fidelity, highlighting the importance of task decomposition for executable inverse graphics with general-purpose VLMs. Finally, we showcase various downstream applications enabled by the reconstructed editable Blender scenes.

URL PDF HTML ☆

赞 1 踩 0

2606.02578 2026-06-02 cs.CV cs.AI

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

通过感知扰动和奖励建模减轻多模态大语言模型作为评判者中的感知判断偏差

Seojeong Park, Jiho Choi, Junyong Kang, Seonho Lee, Jaeyo Shin, Hyunjung Shim

AI总结本文通过构建感知扰动数据集和结合GRPO奖励与批排序目标的统一训练框架，解决了多模态大语言模型作为评判者时因视觉证据与文本线索冲突而产生的感知判断偏差问题，显著提升了感知忠实度和与人类评价的一致性。

详情

Comments: ICML 2026

AI中文摘要

最近的多模态大语言模型展示了强大的推理能力，但它们作为自动评估器的可靠性仍然受到一个关键弱点的限制：当视觉证据与文本线索冲突时，多模态大语言模型评判者倾向于奖励看似合理的叙述而非感知上正确的答案。我们识别并系统分析了这一现象，称之为感知判断偏差。通过受控的视觉扰动，现有的多模态评判者经常锚定于响应文本而非自身的视觉感知，导致不一致且不可验证的评估。为了解决这个问题，我们引入了感知扰动判断数据集，该数据集构建了最小编辑的反事实响应，隔离了感知错误并实现了可验证的监督。基于该数据集，我们开发了一个统一的训练框架，将结构化的基于GRPO的奖励与批排序目标相结合，实现了无需显式成对标签的连贯全局排序。在多种多模态大语言模型作为评判者的基准测试上的实验表明，我们的方法显著提高了感知忠实度、排序连贯性以及与人类评价的一致性。我们的结果为训练感知基础、可解释且对视觉推理冲突鲁棒的多模态评判者建立了一条可扩展且可泛化的路径。

英文摘要

Recent multimodal large language models have demonstrated strong reasoning ability, yet their reliability as automated evaluators remains limited by a critical weakness: when visual evidence conflicts with textual cues, MLLM judges tend to reward plausible narratives over perceptually correct answers. We identify and systematically analyze this phenomenon, which we term Perceptual Judgment Bias. Through controlled visual perturbations, existing multimodal judges frequently anchor on the response text instead of their own visual perception, leading to inconsistent and non-verifiable evaluations. To address this issue, we introduce the Perceptually Perturbed Judgment Dataset, which constructs minimally edited counterfactual responses that isolate perceptual errors and enable verifiable supervision. Building on this dataset, we develop a unified training framework that combines a structured GRPO-based reward with a batch-ranking objective, achieving coherent global ordering without explicit pairwise labels. Experiments across diverse MLLM-as-a-Judge benchmarks show that our approach substantially improves perceptual fidelity, ranking coherence, and alignment with human evaluation. Our results establish a scalable and generalizable pathway for training multimodal judges that are perceptually grounded, interpretable, and robust to visual-reasoning conflicts.

URL PDF HTML ☆

赞 0 踩 1

2606.02577 2026-06-02 cs.RO cs.CV

RoboDream: Compositional World Models for Scalable Robot Data Synthesis

RoboDream: 用于可扩展机器人数据合成的组合世界模型

Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li, Harshitha Rajaprakash, Pavel Tokmakov, Muhammad Zubair Irshad, Vitor Guizilini, Yue Wang

AI总结提出一种以具身为中心的组合世界模型，通过将轨迹执行与环境合成解耦，实现从新视角、新场景和新物体中合成逼真演示数据，并展示其在数据扩展和减少真实数据需求方面的有效性。

详情

Comments: Project page: https://junjieye.com/RoboDream/

AI中文摘要

扩展机器人学习需要大规模、多样化的演示，然而通过远程操作收集真实世界数据仍然过于昂贵和耗时。虽然视频扩散模型为数据扩展提供了一条有希望的途径，但现有的生成方法通常局限于表面的视觉增强，或者遭受产生物理不可行运动的具身幻觉。我们提出了一种可泛化的以具身为中心的世界模型，通过合成具有新物体、新场景和新视角的逼真演示来实现可扩展的数据生成。我们的方法将生成锚定到渲染的机器人运动，同时以显式的场景和物体先验为条件，有效地将轨迹执行与环境合成解耦。这种公式有可能解锁两种强大的数据扩展能力：（1）检索与重生，将现有轨迹重新用于全新的上下文而无需新的运动数据；（2）无道具远程操作，操作员操纵空空气，模型随后幻觉出目标物体和场景，消除了重置时间。我们通过真实世界实验证明，我们生成的数据持续改进下游策略性能，并在各种操作任务中显著减少真实世界数据需求。

英文摘要

Scaling robot learning requires large-scale, diverse demonstrations, yet real-world data collection via teleoperation remains prohibitively expensive and time-consuming. While video diffusion models offer a promising avenue for data scaling, existing generative approaches are often limited to superficial visual augmentation, or suffer from embodiment hallucinations that yield physically infeasible motions. We present a generalizable embodiment-centric world model that achieves scalable data generation by synthesizing photorealistic demonstrations with novel objects, in novel scenes, and from novel viewpoints. Our approach anchors generation to rendered robot motion while conditioning on explicit scene and object priors, effectively decoupling trajectory execution from environment synthesis. This formulation has the potential to unlock two powerful data scaling capabilities: (1) retrieval and rebirth, which repurposes existing trajectories into entirely new contexts without new motion data; and (2) prop-free teleoperation, where operators manipulate empty air and the model hallucinates the target objects and scene afterwards, eliminating reset time. We demonstrate with real-world experiments that our generated data consistently improves downstream policy performance and significantly reduces real-world data requirements across diverse manipulation tasks.

URL PDF HTML ☆

赞 0 踩 1

2606.02575 2026-06-02 cs.CV

From Zero to Hero: Training-Free Custom Concept Spawning in World Models

从零到英雄：世界模型中的免训练自定义概念生成

Kiymet Akdemir, Pinar Yanardag

AI总结提出SPAWN方法，利用图像到视频骨干网络的结构特性，通过交换参考帧锚点与外部概念潜变量，实现无需训练即可在世界模型中生成用户指定的视觉概念。

详情

AI中文摘要

自回归世界模型已成为交互式视频生成的一种强大范式，允许用户通过动作在动态生成的环境中进行导航。这些模型通常以文本提示和/或单个参考帧为条件，从中生成整个世界。然而，一旦用户导航到该帧可见区域之外，未见区域将由基础模型的先验填充，用户无法指定应该出现什么以及出现在哪里。对于游戏、交互式故事讲述和模拟等应用来说，这是一个根本性的限制，因为在这些应用中，可控的场景组成至关重要。我们将这种缺失的能力称为概念生成；将用户指定的视觉概念引入世界模型，类似于游戏引擎中的生成。我们提出了SPAWN（Swapping Pinned Anchor with Windowed iNjection），一种免训练的概念生成方法。SPAWN利用了图像到视频骨干网络的结构特性：上下文记忆的第一个槽位被固定到参考帧，并作为每个生成块的基石锚点。通过在短注入窗口内将该锚点与外部概念潜变量交换，并让原始锚点返回，概念通过模型自身的记忆在滚动过程中自然传播。SPAWN支持从角色和道具等细粒度实体到建筑物和地标等大规模元素的概念，并接受概念图像或文本描述作为输入。实验表明，SPAWN在保持身份和时间一致性的同时，以一致的光照、尺度和视角整合概念，证明了在现有自回归世界模型中无需训练即可实现可控的概念生成。

英文摘要

Autoregressive world models have emerged as a powerful paradigm for interactive video generation, allowing users to navigate dynamically generated environments through actions. These models are typically conditioned on a text prompt and/or a single reference frame, from which the entire world is generated. Yet the moment the user navigates beyond what is visible in that frame, the unseen regions are populated by the base model's priors, with no mechanism for the user to specify what should appear and where. This is a fundamental limitation for applications such as gaming, interactive storytelling, and simulation, where controllable scene composition is essential. We refer to this missing capability as concept spawning; introducing a user-specified visual concept into a world model, analogous to spawning in a game engine. We introduce SPAWN (Swapping Pinned Anchor with Windowed iNjection), a training-free method for concept spawning. SPAWN exploits a structural property of image-to-video backbones: the first slot of the context memory is pinned to the reference frame and acts as a foundational anchor for every generated chunk. By swapping this anchor with an external concept latent over a short injection window and letting the original anchor return, we cause the concept to propagate naturally through the rollout via the model's own memory. SPAWN supports concepts from fine-grained entities such as characters and props to large-scale elements such as buildings and landmarks, and accepts either a concept image or a text description as input. Experiments show that SPAWN integrates concepts with consistent lighting, scale, and perspective while preserving identity and temporal coherence, demonstrating that controllable concept spawning is achievable in existing autoregressive world models without any training.

URL PDF HTML ☆

赞 1 踩 0

2606.02573 2026-06-02 cs.CV

HumanNOVA: Photorealistic, Universal and Rapid 3D Human Avatar Modeling from a Single Image

HumanNOVA: 从单张图像实现逼真、通用且快速的3D人体化身建模

Hezhen Hu, Wangbo Zhao, Lanqing Guo, Hanwen Jiang, Jonathan C. Liu, Zhiwen Fan, Kai Wang, Zhangyang Wang, Georgios Pavlakos

AI总结提出HumanNOVA模型，通过可扩展数据生成流水线和前馈令牌条件化架构，从单张RGB图像快速生成逼真3D人体化身，无需测试时优化。

详情

Comments: CVPR 2026 Highlight

AI中文摘要

在本文中，我们提出HumanNOVA，一种从单张RGB图像生成3D人体化身的逼真、通用且快速的模型。由于缺乏多样化、高质量的3D人体数据，实现逼真度和泛化性具有挑战性。为此，我们构建了一个可扩展的数据生成流水线，遵循两种策略。第一种是利用现有绑定资产，并通过日常生活中的大量姿态进行动画化。第二种策略是利用现有的多摄像头人体捕捉，并采用拟合方法生成更多样化的视角用于训练。这两种策略使我们能够扩展到10万个资产，显著增强了数据的数量和多样性，以支持稳健的模型训练。在架构方面，HumanNOVA采用前馈、令牌条件化的化身建模框架，可在不到一秒内实现快速推理，且无需测试时优化。给定输入图像和估计的简化人体网格（SMPL），无需详细几何或外观，模型首先将两者编码为紧凑的令牌表示。这些令牌随后作为条件信号，通过交叉注意力融合，构建基于三平面的3D化身表示。在多个基准上的大量实验表明，我们的方法在定量和定性上均具有优越性，并且在多样输入图像条件下具有鲁棒性。项目页面：https://HumanNOVA.github.io。

英文摘要

In this paper, we present HumanNOVA, a photorealistic, universal, and rapid model for generating 3D human avatars from a single RGB image. Achieving both photorealism and generalization is challenging due to the scarcity of diverse, high-quality 3D human data. To address this, we build a scalable data generation pipeline that follows two strategies. The first one is to leverage existing rigged assets and animate them with extensive poses from daily life. The second strategy is to utilize existing multi-camera captures of humans and employ fitting to generate more diverse views for training. These two strategies enable us to scale up to 100k assets, significantly enhancing both the quantity and the diversity of data for robust model training. In terms of the architecture, HumanNOVA adopts a feed-forward, token-conditioned avatar modeling framework that allows fast inference in less than one second and requires no test-time optimization. Given an input image and an estimated simplified human mesh (SMPL) without detailed geometry or appearance, the model first encodes both inputs into compact token representations. These tokens then act as conditioning signals and are fused through cross-attention to construct a triplane-based 3D avatar representation. Extensive experiments on multiple benchmarks demonstrate the superiority of our approach, both quantitatively and qualitatively, as well as its robustness under diverse input image conditions. Project page at https://HumanNOVA.github.io .

URL PDF HTML ☆

赞 0 踩 1

2606.02572 2026-06-02 cs.CV

VISReg: Variance-Invariance-Sketching Regularization for JEPA training

VISReg: 用于JEPA训练的方差-不变性-素描正则化

Haiyu Wu, Randall Balestriero, Morgan Levine

AI总结提出VISReg正则化方法，用基于切片Wasserstein距离的素描目标替代协方差，以增强分布形状约束，在防止嵌入坍塌的同时提升鲁棒性和性能。

详情

AI中文摘要

自监督学习方法通过建模启发式或对嵌入空间进行显式正则化来防止嵌入坍塌。其中，VICReg将正则化分解为方差和协方差目标，提供了灵活性和可解释性。然而，协方差仅捕获二阶统计量——鼓励去相关，但未能强制执行稳定训练所需的完整分布形状。基于素描的方法如SIGReg通过将嵌入对齐到各向同性高斯分布来解决这一问题，但缺乏灵活性且在坍塌情况下梯度消失。我们提出方差-不变性-素描正则化（VISReg），它用基于切片Wasserstein距离的素描目标替代协方差，强制执行完整的分布形状，同时保留方差项以控制尺度。通过解耦尺度和形状，VISReg结合了VICReg的灵活性和素描方法的分布严谨性，即使在坍塌情况下也能提供稳健的梯度。我们表明VISReg具有线性可扩展性，在低质量数据集上优于现有正则化方法，并且对长尾和低秩场景具有鲁棒性。在ImageNet-1K上预训练后，VISReg在分布外数据集上达到了最先进的性能。在ImageNet-22K上预训练后，它匹配了DINOv2的OOD性能，尽管后者使用了10倍以上的数据（LVD-142M）。项目和代码：https://haiyuwu.github.io/visreg。

英文摘要

Self-supervised learning methods prevent embedding collapse via modeling heuristics or explicit regularization of the embedding space. Among the latter, VICReg decomposes regularization into variance and covariance objectives, offering flexibility and interpretability. However, covariance captures only second-order statistics -- encouraging decorrelation but failing to enforce the full distributional shape needed for stable training. Sketching-based methods such as SIGReg address this by aligning embeddings to an isotropic Gaussian, but lack flexibility and suffer from vanishing gradients under collapse. We propose Variance-Invariance-Sketching Regularization (VISReg), which replaces covariance with a Sliced-Wasserstein-based sketching objective that enforces full distributional shape, while retaining a variance term for scale control. By decoupling scale and shape, VISReg combines VICReg's flexibility with the distributional rigor of sketching methods, providing robust gradients even under collapse. We show that VISReg scales linearly, outperforms existing regularization on low-quality datasets, and is resilient to long-tailed and low-rank regimes. Pre-trained on ImageNet-1K, VISReg achieves state-of-the-art performance on out-of-distribution datasets. Pre-trained on ImageNet-22K, it matches DINOv2's OOD performance despite the latter using 10x more data (LVD-142M). Project and code: https://haiyuwu.github.io/visreg.

URL PDF HTML ☆

赞 0 踩 0

2606.02569 2026-06-02 cs.CV cs.AI cs.CL

AdaCodec: A Predictive Visual Code for Video MLLMs

AdaCodec: 面向视频多模态大语言模型的预测性视觉编码

Haowen Hou, Zhen Huang, Zheming Liang, Qingyi Si, Chenglin Li, Shuai Dong, Kele Shao, Ruilin Li, Dianyi Wang, Nan Duan, Jiaqi Wang

AI总结针对视频帧间冗余问题，提出预测性视觉编码AdaCodec，通过条件预测代价决定是否发送完整参考帧或紧凑P-令牌，在匹配视觉令牌预算下提升性能，并大幅降低首令牌延迟。

详情

Comments: 23 pages

AI中文摘要

视频在时间上是冗余的：相邻帧通常共享大部分物体、背景和布局。然而，现有的视频多模态大语言模型（视频MLLMs）通常将每个采样帧编码为独立的RGB图像，导致视觉令牌重复先前帧中已有的内容。这提示了一种更直接的视频接口：仅当场景无法从先前上下文中良好预测时，才发送完整的参考帧；否则，传输帧间变化的紧凑描述。我们将这种接口称为\emph{预测性视觉编码}，并针对视频MLLMs实例化为 extbf{AdaCodec}。AdaCodec仅在条件预测代价高时，为参考帧花费完整的视觉令牌；否则，它将帧间变化（包括运动和预测残差）编码为紧凑的P-令牌。在所有11个基准测试中，在匹配视觉令牌预算下，AdaCodec相比基于Qwen3-VL-8B的逐帧RGB基线有所改进。即使在1/7的预算下，使用32k令牌的AdaCodec在所有长视频基准测试中超越了224k基线；在五个通用视频基准测试中，它提高了平均得分，同时将首令牌时间从9.26秒大幅缩短至1.62秒。

英文摘要

Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existing video multimodal large language models (video MLLMs) usually encode each sampled frame as an independent RGB image, causing visual tokens to repeat content already present in earlier frames. This suggests a more direct video interface: send a full reference frame only when the scene cannot be predicted well from prior context, and otherwise transmit a compact description of inter-frame changes. We call this interface a \emph{predictive visual code}, and instantiate it for video MLLMs as \textbf{AdaCodec}. AdaCodec spends full visual tokens on a reference frame only when its conditional predictive cost is high; otherwise, it encodes inter-frame changes, including motion and prediction residuals, as compact P-tokens. Across all eleven benchmarks, AdaCodec improves over the Qwen3-VL-8B per-frame RGB baseline at a matched visual-token budget. Even at $1/7$ the budget, AdaCodec with 32k tokens surpasses the 224k baseline on all long-video benchmarks; on five general-video benchmarks, it raises the average score while substantially cutting time-to-first-token from 9.26s to 1.62s.

URL PDF HTML ☆

赞 0 踩 0

2606.02568 2026-06-02 cs.AI cs.CL cs.ET cs.MA

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

ClinEnv：面向智能体的交互式多阶段长时程电子健康记录环境

Yuxing Lu, Yushuhong Lin, Wenqi Shi, J. Ben Tamo, Xukai Zhao, Jinzhuo Wang, May Dongmei Wang

AI总结提出ClinEnv，一个基于真实住院患者数据的交互式基准，通过多阶段决策序列评估大语言模型在不确定性下逐步收集信息并做出不可逆决策的能力，发现模型决策质量与过程质量严重脱节。

详情

Comments: 20 pages, 6 figures, 12 tables

AI中文摘要

临床实践并非从枚举选项中选择答案：医生会逐步收集异质信息，并在不确定性下做出顺序的、不可逆的决策。静态基准无法探测，而现有的交互式医学基准各自至少在一个方面有所妥协。我们提出ClinEnv，一个交互式基准，在称为纵向住院模拟的范式下，将大语言模型评估为真实住院患者的主治医生。每个病例自动构建为有序的决策阶段序列；在每个阶段，模型必须主动查询四个专门的智能体，然后才能提交药物、程序和诊断。ClinEnv通过确定性本体匹配对模型的决策内容进行评分，同时也对其信息收集过程进行评分。在七个模型中，最强的模型仅达到0.31的决策F1分数，且结果质量与过程质量严重脱节。困难集中在管理决策和后期阶段，模型恢复出院诊断的可靠性远高于管理行动（F1分别为0.51 vs 0.17），并且随着病例进展继续发出冗余查询。ClinEnv使这种信息获取差距（仅通过结果评估无法察觉）变得可直接测量。

英文摘要

Clinical practice is not the selection of an answer from enumerated options: a physician gathers heterogeneous information incrementally and commits to sequential, irreversible decisions under uncertainty. Static benchmarks cannot probe and existing interactive medical benchmarks each compromise on at least one of them. We present ClinEnv, an interactive benchmark that evaluates LLMs as attending physicians over real inpatient admissions under a paradigm we term Longitudinal Inpatient Simulation. Each case is automatically constructed into an ordered sequence of decision stages; at every stage the model must actively query four specialized agents before committing to medications, procedures, and diagnoses. ClinEnv scores both what the model decides, through deterministic ontology-grounded matching, and how it gathers information. Across seven models, the strongest reaches only 0.31 decision F1, and outcome quality is sharply decoupled from process quality. Difficulty concentrates in management decisions and later stages, where models recover discharge diagnoses far more reliably than management actions (0.51 vs. 0.17 F1) and continue to issue redundant queries as cases progress. ClinEnv makes this information-acquisition gap, invisible to outcome-only evaluation, directly measurable.

URL PDF HTML ☆

赞 0 踩 0

2606.02567 2026-06-02 math.FA cs.IT math.IT

Strong Polarization and Entropy

强极化与熵

Daniel Galicer, Oscar Ortega-Moreno, Damián Pinasco

AI总结本文证明了实Hilbert空间中单位向量的加权强极化不等式，并给出其在线性泛函乘积极化与Bang定理强化中的应用，同时揭示了该不等式与Shannon熵的关联。

2606.02565 2026-06-02 cs.CV

Policy-based Foveated Imaging and Perception

基于策略的中央凹成像与感知

Howard Xiao, Jan Ackermann, Boyang Deng, Gordon Wetzstein

AI总结提出一种实时、预测且任务感知的中央凹成像系统，通过强化学习策略动态分配像素带宽到任务相关区域，在严格像素预算下实现高任务性能。

详情

Comments: Project website at https://howardxiao.ca/foveated/

AI中文摘要

超高分辨率图像传感器具有捕捉许多视觉感知任务所需精细空间细节的潜力，但在实际带宽、延迟和功耗约束下，获取和处理所有全分辨率像素通常是不可行的。现有方法通过空间或时间下采样等采集策略来解决这一挑战，这些策略在评估任务相关性之前不可逆地丢弃信息。在这项工作中，我们引入了一种实时、预测且任务感知的中央凹成像系统，该系统直接在图像采集时运行。利用新兴的双流传感器架构，我们的方法将有限的像素带宽动态分配给任务相关的感兴趣区域，同时保持低分辨率的全局上下文。我们将中央凹采集建模为传感器注意力策略学习问题，其中过去的观察指导决定未来测量的动作，从而闭合感知-采集循环。通过在多个感知任务上的广泛模拟，我们证明了我们的方法在严格的像素预算下实现了高任务性能，并显著优于在相同带宽下运行的相关基线。我们进一步在200兆像素双流传感器上验证了我们的系统，在实际带宽和延迟约束下捕获真实世界视频，证明了任务驱动的采集时中央凹成像的实际可行性。

英文摘要

Ultra-high-resolution image sensors offer the potential to capture fine spatial details critical for many visual perception tasks, but acquiring and processing all pixels at full resolution is often infeasible under realistic bandwidth, latency, and power constraints. Existing approaches address this challenge through acquisition strategies such as spatial or temporal downsampling, which irrevocably discard information before task relevance can be assessed. In this work, we introduce a real-time, predictive, and task-aware foveated imaging system that operates directly at image acquisition time. Leveraging emerging dual-stream sensor architectures, our method dynamically allocates limited pixel bandwidth to task-relevant regions of interest while maintaining a low-resolution global context. We formulate foveated acquisition as a sensor attention policy-learning problem, in which past observations guide actions that determine future measurements, closing the perception-acquisition loop. Through extensive simulation across multiple perception tasks, we demonstrate that our approach achieves high task performance under strict pixel budgets and significantly outperforms relevant baselines operating at the same bandwidth. We further validate our system on a 200-megapixel dual-stream sensor, capturing real-world videos under realistic bandwidth and latency constraints, demonstrating the practical feasibility of task-driven, acquisition-time foveated imaging.

URL PDF HTML ☆

赞 0 踩 0

2606.02564 2026-06-02 cs.CV

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

VLMs 是视频推理的好老师：通过自适应测试时优化

Junhao Cheng, Liang Hou, Tianxiong Zhong, Xin Tao, Pengfei Wan, Kun Gai, Jing Liao

AI总结提出将视觉语言模型（VLM）作为“教师”，通过提取任务规则并设计可微分奖励，指导视频生成模型（VGM）在测试时在线优化轻量级 LoRA 模块，从而提升视频推理的泛化能力。

详情

Comments: Project Page: https://VLM-as-Teacher.github.io/

AI中文摘要

最近的“视频推理”范式利用视频生成模型（VGM）生成时间连贯的视觉轨迹来完成推理任务。尽管最先进的 VGM 在视觉质量上表现出色，但它们往往难以理解和遵循任务特定规则，导致在各种推理场景中出现逻辑失败。现有工作尝试利用视觉语言模型（VLM）作为问题预求解器，为 VGM 生成或细化文本指导。然而，文本描述无法捕捉复杂的时空细节，并且 VGM 即使有有效计划，也常常难以忠实执行细粒度或长尾指令。尽管 VLM 作为求解器存在困难，但它们具备强大的感知能力，可以评估过程约束满足和最终目标达成。利用这一优势，我们引入了一种范式转变，将 VLM 的角色转变为“教师”。具体来说，VLM 教师提取任务特定规则以制定可微分奖励，通过测试时在线优化轻量级 LoRA 模块来指导 VGM 推理器。该策略实现了自适应测试时优化，并将推理能力扩展到 VGM 的内在边界之外。在符号（VBVR-Bench）和通用（RULER-Bench）视频推理基准上的评估表明，所提出的方法平均性能提升 16.7 个百分点，在可比测试时成本下，大幅优于 VLM-as-Solver 范式（+0.4 个百分点）和 Best-of-N 缩放（+2.2 个百分点）。这些发现表明，将 VLM 作为测试时教师集成，为实现可泛化的视频推理提供了一种有前景的范式。项目页面：https://VLM-as-Teacher.github.io/

英文摘要

The recent "Reasoning with Video" paradigm utilizes Video Generation Models (VGMs) to generate temporally coherent visual trajectories to complete reasoning tasks. Although state-of-the-art VGMs excel at visual quality, they often struggle to understand and follow task-specific rules, leading to logical failures across diverse reasoning scenarios. Existing efforts try to utilize Vision-Language Models (VLMs) as problem pre-solvers to produce or refine textual guidance for the VGM. However, textual descriptions fail to capture intricate spatiotemporal details, and VGMs often struggle to faithfully execute fine-grained or long-tail instructions even with a valid plan. While VLMs struggle as solvers, they possess strong perception capabilities to evaluate process-constraint satisfaction and final-goal achievement. Leveraging this strength, we introduce a paradigm shift that transitions the role of VLMs to "teachers". Specifically, a VLM teacher extracts task-specific rules to formulate differentiable rewards, guiding a VGM Reasoner via test-time online optimization of a lightweight LoRA module. This strategy enables adaptive test-time optimization and extends the reasoning capabilities beyond the VGM's intrinsic boundaries. Evaluations on symbolic (VBVR-Bench) and general-purpose (RULER-Bench) video reasoning benchmarks show that the proposed method yields a 16.7-point average performance gain, outperforming the VLM-as-Solver paradigm (+0.4 points) and Best-of-N scaling (+2.2 points) by a large margin at comparable test-time cost. These findings reveal that integrating VLMs as test-time teachers offers a promising paradigm for achieving generalizable video reasoning. Project Page: https://VLM-as-Teacher.github.io/

URL PDF HTML ☆

赞 0 踩 0

2606.02563 2026-06-02 cs.LG cs.CR cs.DC

IntraShuffler: A Privacy Preserving Framework for Heterogeneous DP Federated Learning

IntraShuffler：一种用于异构差分隐私联邦学习的隐私保护框架

Farhin Farhad Riya, Olivera Kotevska, Jinyuan Stella Sun

AI总结针对异构差分隐私联邦学习中诚实但好奇的服务器通过梯度结构推断客户端属性的隐私推理攻击，提出IntraShuffler中间件框架，通过隐私感知混洗机制破坏梯度持久结构，同时保持ε感知聚合，将梯度可恢复性降低60%以上，代理推理准确率从0.78降至0.33。

详情

AI中文摘要

联邦学习中的异构差分隐私允许客户端根据机构策略和数据敏感性选择个体隐私预算（$\varepsilon_i$）。实践中，许多HDP-FL系统采用$\varepsilon$感知的服务器聚合，通过根据声明的隐私预算重新加权客户端更新来提高模型效用。然而，联邦学习中的梯度更新保留了由非独立同分布数据引起的结构模式，而$\varepsilon$感知聚合暴露的这些额外信号为诚实但好奇的服务器提供了新的推理机会。在这项工作中，我们首先展示，配备梯度去噪和代理建模的服务器可以在现实知识约束下发起隐私推理攻击，该攻击推断客户端的分布属性并链接同一客户端在不同训练轮次中的更新，通过代理推理准确率和链接成功率衡量。混洗模型通过匿名化更新来源被广泛研究作为针对此类推理风险的防御，但它与HDP-FL的$\varepsilon$感知聚合根本不相容。为了解决这一挑战，我们提出了IntraShuffler，一个专为HDP-FL系统设计的中间件防御框架。IntraShuffler引入了一种隐私感知的混洗机制，将客户端分组到隐私兼容的桶中，并在每个桶内执行参数级混洗，以破坏持久的梯度结构，同时保持$\varepsilon$感知聚合。在四个不同数据集上的实验表明，IntraShuffler将梯度可恢复性降低了60%以上，并将代理推理准确率从0.78降至0.33，同时在多种联邦学习聚合规则下保持可比的模型效用。

英文摘要

Heterogeneous Differential Privacy (HDP) in Federated Learning (FL) allows clients to select individual privacy budgets ($\varepsilon_i$) according to institutional policies and data sensitivity. In practice, many HDP-FL systems employ $\varepsilon$-aware server aggregation to improve model utility by re-weighting client updates according to their declared privacy budgets. However, gradient updates in FL retain structural patterns induced by non-independent and identically-distributed (non-IID) data, and these additional signals exposed by $\varepsilon$-aware aggregation create new opportunities for inference by an honest-but-curious server. In this work, we first show that a server equipped with gradient denoising and surrogate modeling can mount a \emph{Privacy Inference Attack} that infers distributional attributes of clients and links updates from the same client across training rounds, measured via surrogate inference accuracy and linkage success, under realistic knowledge constraints. The Shuffle-Model has been widely studied as a defense against such inference risks by anonymizing update sources, but it is fundamentally incompatible with HDP-FL $\varepsilon$-aware aggregation. To address this challenge, we propose \textbf{IntraShuffler}, a middleware defense framework designed for HDP-FL systems. IntraShuffler introduces a privacy-aware shuffling mechanism that groups clients into privacy-compatible buckets and performs parameter-level shuffling within each bucket to disrupt persistent gradient structure while preserving $\varepsilon$-aware aggregation. Experiments across four different datasets show that IntraShuffler reduces gradient recoverability by over 60% and decreases surrogate inference accuracy from 0.78 to 0.33 while maintaining comparable model utility across multiple FL aggregation rules.

URL PDF HTML ☆

赞 0 踩 0

2606.02562 2026-06-02 cs.RO cs.AI cs.LG cs.SY eess.SY

Permissive Safety Through Trusted Inference: Verifiable Belief-Space Neural Safety Filters for Assured Interactive Robotics

通过可信推理实现许可安全：可验证的信念空间神经安全滤波器用于保证交互式机器人

Haimin Hu

AI总结针对交互式机器人中人类不确定性带来的安全问题，提出一种基于共形预测的信念空间安全滤波器验证方法，在考虑推理可靠性的前提下保证高概率安全，并减少保守性。

详情

Comments: Accepted to the 17th World Symposium on the Algorithmic Foundations of Robotics (WAFR 2026)

AI中文摘要

与人类交互的自主机器人必须在人类引起的不确定性（如偏好、目标、能力和合作意愿）下做出安全高效的决策。安全滤波器是确保交互式机器人安全性的流行方法，其模块化设计将安全性与性能分离，使机器人能够在最小影响任务效率的情况下安全地与人交互。传统安全滤波器通常仅在物理空间中运行，忽略了机器人在线学习和适应的能力，而最近提出的信念空间安全滤波器（BeliefSF）在闭环中考虑机器人安全性，并通过运行时推理主动减少机器人的不确定性，从而降低滤波的保守性。然而，由于运行时推理的误差以及处理信念空间高维性所需的安全滤波器神经近似，为部署BeliefSF的机器人提供形式化安全保证仍然是一个重大挑战。本文提出一种算法方法，使用共形预测来认证BeliefSF的高概率安全性，同时明确考虑机器人运行时推理模块的可靠性。我们的方法利用信念空间安全滤波的结构，将验证集中在预期推理可靠的区域。它保留了标准共形预测的简单性和样本复杂度，但能够认证一个显著更不保守的安全滤波器。通过一个模拟的人-车交互基准测试，我们展示了我们的方法验证了一个比标准共形预测基线更许可的信念空间安全滤波器。

英文摘要

Autonomous robots that interact with people must make safe and efficient decisions under human-induced uncertainty, such as their preferences, goals, competency, and willingness to cooperate. Safety filters are a popular approach for ensuring safety in interactive robotics, since their modular design separates safety from performance, allowing robots to operate safely around people with minimal impact on task efficiency. While traditional safety filters typically operate only in the physical space, neglecting the robot's ability to learn and adapt online, the recently proposed belief-space safety filter (BeliefSF) reasons about robot safety in closed-loop with runtime inference that actively reduces the robot's uncertainty online, thereby reducing conservativeness in filtering. However, providing formal safety guarantees for robots deploying BeliefSF remains a significant challenge due to errors in runtime inference and neural approximation of safety filters required to handle the high dimensionality of belief spaces. In this paper, we propose an algorithmic approach to certify high-probability safety of BeliefSF using conformal prediction, while explicitly accounting for the reliability of the robot's runtime inference module. Our method leverages the structure of belief-space safety filtering by focusing verification on a region where inference is expected to be reliable. It preserves the simplicity and sample complexity of standard conformal prediction, yet can certify a substantially less conservative safety filter. Through a simulated human-vehicle interaction benchmark, we show that our approach verifies a significantly more permissive belief-space safety filter than a standard conformal prediction baseline.

URL PDF HTML ☆

赞 0 踩 0

2606.02559 2026-06-02 cs.CL cs.AI

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

从层到子模块：重新思考基于替换的LLM压缩中的粒度

Elia Cunegatti, Marcus Vukojevic, Erik Nielsen, Giovanni Iacca

AI总结提出子模块级别的非连续替换压缩方法SubFit，通过为注意力和前馈子模块分别设计轻量残差旁路，在多种LLM上实现更好的困惑度-准确率权衡。

详情

AI中文摘要

大型语言模型（LLM）的后训练压缩会移除整个架构组件，要么删除它们，要么用拟合模块替换它们。现有的基于替换的方法共享两个设计约束：全层粒度和连续选择。我们认为这过于严格：事实上，预训练Transformer中的冗余并不局限于连续区域，也不均匀分布在注意力和前馈输出之间，这意味着不同的策略最适合近似不同的子模块类型，并且可移除的组件不需要聚集在连续的深度范围内。基于这一直觉，我们引入了SubFit（子模块级拟合残差替换），它在子模块级别压缩LLM：注意力和前馈子模块被非连续地选择，并且每个子模块都获得自己的轻量级拟合残差旁路。SubFit在训练后运行，仅需要校准数据。在十个LLM（五个基础模型，五个指令微调模型）、五个从12.5%到37.5%的稀疏度水平以及四个基于替换的基线上，SubFit在评估的稀疏度水平上实现了最佳的聚合困惑度-准确率权衡，在激进压缩下获得更大收益。在25%稀疏度下，它保留了84.6%的密集下游准确率，困惑度退化2.42倍，而最强基线分别为81.6%和4.34倍，同时实现了可测量的推理加速和KV缓存节省。代码可在https://github.com/eliacunegatti/SubFit获取。

英文摘要

Post-training compression of Large Language Models (LLMs) removes entire architectural components, either deleting them or replacing them with fitted modules. Existing replacement-based methods share two design constraints: full-layer granularity and contiguous selection. We argue that this is overly restrictive: in fact, redundancy in pretrained transformers is not confined to contiguous regions, nor does it evenly distribute between Attention and FeedForward outputs, implying that different strategies best approximate different submodule types and that removable components need not cluster within contiguous depth ranges. Based on this intuition, we introduce SubFit (Submodule-level Fitted residual replacement), which compresses LLMs at the submodule level: Attention and FeedForward submodules are selected non-contiguously, and each receives its own lightweight fitted residual bypass. SubFit operates post-training and requires only calibration data. Across ten LLMs (five base, five instruction-tuned), five sparsity levels from 12.5% to 37.5%, and four replacement-based baselines, SubFit achieves the best aggregate perplexity-accuracy trade-off across the evaluated sparsity levels, with larger gains under aggressive compression. At 25% sparsity, it retains 84.6% of dense downstream accuracy and incurs 2.42x perplexity degradation, against 81.6% and 4.34x for the strongest baselines, while delivering measurable inference speedup and KV-cache savings. Code is available at https://github.com/eliacunegatti/SubFit.

URL PDF HTML ☆

赞 0 踩 0

2606.02556 2026-06-02 cs.CL

HERO'S JOURNEY: Testing Complex Rule Induction with Text Games

英雄之旅：用文本游戏测试复杂规则归纳

Anshun Asher Zheng, Kanishka Misra, David I. Beaver, Junyi Jessy Li

AI总结本文提出HERO'S JOURNEY基准，通过目标导向的文本游戏评估大型语言模型在属性与程序归纳任务中的规则推理能力，发现模型虽能进行规则归纳但能力有限且不均衡，程序执行成为瓶颈，而表面语义影响较小。

2606.02553 2026-06-02 cs.CV

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

LongLive-RAG: 一种用于长视频生成的通用检索增强框架

Qixin Hu, Shuai Yang, Wei Huang, Song Han, Yukang Chen

AI总结提出LongLive-RAG框架，通过将自回归视频生成中的历史潜变量作为可检索记忆，利用查询嵌入检索相关历史潜变量并引入窗口时间增量损失，以减轻滑动窗口注意力导致的误差累积，提升长视频生成质量。

详情

Comments: 20 pages, 7 figures, 4 tables

AI中文摘要

自回归（AR）视频扩散支持可变长度合成，但长时生成常面临累积误差和身份漂移。为提升效率，现有方法在生成时普遍采用滑动窗口注意力。这会产生不可逆的生成轨迹：一旦活动窗口累积外观误差，后续生成只能基于此退化轨迹并进一步漂移。我们通过将长视频生成建模为检索增强生成（RAG）问题来解决这一限制。我们不依赖仅最近窗口，而是将先前生成的潜变量视为动态、可搜索的历史。我们提出LongLive-RAG，一个用于AR视频生成的通用检索框架。在每个新块中，LongLive-RAG使用查询嵌入检索相关历史潜变量。这一轻量级检索步骤相比生成仅增加少量开销，并使生成器能基于非局部上下文而非仅最近窗口进行条件生成。为使检索更具判别性，我们引入窗口时间增量损失，抑制冗余局部相似性并鼓励嵌入捕捉有意义的时序变化。这些组件共同帮助减少滑动窗口注意力引起的误差累积。在多个AR骨干网络和生成长度上的实验表明，长视频质量提升且平均VBench-Long排名最佳。据我们所知，在开放式AR长视频生成方法中，LongLive-RAG是首个将自生成潜变量历史构建为内容可寻址检索记忆的方法。代码见https://github.com/qixinhu11/LongLive-RAG。

英文摘要

Autoregressive (AR) video diffusion enables variable-length synthesis, but long-horizon generation often suffers from accumulated errors and identity drift. For efficiency, existing methods commonly adopt sliding-window attention during generation. This creates an irreversible generation trajectory: once the active window accumulates appearance errors, subsequent generations can only condition on this degraded trajectory and drift further away. We address this limitation by formulating long video generation as a retrieval-augmented generation (RAG) problem. Rather than relying solely on the recent window, we treat previously generated latents as a dynamic, searchable history. We propose LongLive-RAG, a general retrieval framework for AR video generation. At each new block, LongLive-RAG uses a query embedding to retrieve relevant historical latents. This lightweight retrieval step adds only a small overhead relative to generation and lets the generator condition on non-local context instead of only the recent window. To make retrieval more discriminative, we introduce the Window Temporal Delta Loss that suppresses redundant local similarity and encourages embeddings to capture meaningful temporal changes. Together, these components help reduce error accumulation caused by sliding-window attention. Experiments across multiple AR backbones and generation lengths show improved long-video quality and the best average VBench-Long rank. To our knowledge, among open-ended AR long video generation methods, LongLive-RAG is the first to formulate self-generated latent history as content-addressable retrieval memory. Code is available at https://github.com/qixinhu11/LongLive-RAG.

URL PDF HTML ☆

赞 1 踩 0

2606.02552 2026-06-02 cs.CV cs.AI

Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation

建模深度歧义：一种用于无飞点深度估计的混合密度表示

Siyuan Bian, Congrong Xu, Jun Gao

AI总结提出混合密度表示MDA，通过预测每个像素的多个深度假设及其概率，解决深度估计中边界处的飞点伪影问题，显著改善边界重建并消除飞点。

详情

AI中文摘要

尽管深度估计取得了进展，飞点仍然是一个持续存在的失败模式：在物体边界附近，深度估计器经常在前景和背景表面之间的空白空间中预测虚假的3D点。我们将这种伪影追溯到一种标准建模选择：为每个像素分配单个深度假设。在边界处，一个像素可能跨越前景和背景表面，因此其真实深度在两者之间是模糊的。预测单个深度的模型无法同时保留两种可能性，因此训练反而将预测拉向一个位于两个表面之间的中间深度。我们通过MDA解决了这个问题，这是一种混合密度表示，让模型为每个像素预测多个深度假设及其相关概率。在边界附近，不同的假设可以与不同的表面对齐，解码后的深度从这些假设之一中选择，而不是放置在它们之间的空白空间中。在不同的骨干网络上，MDA显著改善了边界重建，并在很大程度上消除了飞点伪影，即使在严重的输入模糊下也是如此，同时增加了可忽略的运行时开销。相同的混合密度框架自然地扩展到透明物体，其中它预测透明像素处的多个深度层，以及天空区域，其中专用组件将无界天空与有限深度区域分开，产生无飞点的天际线。项目页面：https://biansy000.github.io/mda-site/。

英文摘要

Despite advances in depth estimation, flying points remain a persistent failure mode: near object boundaries, depth estimators often predict spurious 3D points in the empty space between foreground and background surfaces. We trace this artifact to a standard modeling choice: assigning each pixel a single depth hypothesis. At boundaries, a pixel can straddle a foreground and a background surface, so its true depth is ambiguous between the two. A model that predicts a single depth cannot keep both possibilities, so training instead pulls the prediction toward an intermediate depth that lies on neither surface. We address this with MDA, a mixture-density representation that lets the model predict multiple depth hypotheses and their associated probabilities for each pixel. Near boundaries, different hypotheses can align with different surfaces, and the decoded depth is selected from one of these hypotheses rather than placed in the empty space between them. Across different backbones, MDA substantially improves boundary reconstruction and largely removes flying-point artifacts even under severe input blur, while adding negligible runtime overhead. The same mixture-density framework naturally extends to transparent objects, where it predicts multiple depth layers at transparent pixels, and to sky regions, where a dedicated component separates the unbounded sky from finite-depth regions, producing flying-point-free skylines. Project Page: https://biansy000.github.io/mda-site/.

URL PDF HTML ☆

赞 0 踩 0

2606.02551 2026-06-02 cs.RO cs.CV

AFUN: Towards an Affordance Foundation Model for Functionality Understanding

AFUN：迈向用于功能理解的可供性基础模型

Zhaoning Wang, Yi Zhong, Jiawei Fu, Henrik I. Christensen, Jun Gao

AI总结提出AFUN模型，从单张RGB-D图像和语言任务描述中预测任务条件功能掩码和3D接触后运动曲线，通过大规模标准化数据流水线实现开放世界泛化，在多项基准测试中显著优于现有方法。

详情

AI中文摘要

可供性理解连接视觉感知和物理动作，作为开放非结构化真实环境中机器人操作的可解释接口。然而，构建一个不仅理解交互发生的位置和方式，还能跨不同环境、物体和任务泛化的可供性基础模型，仍然是一个长期的研究挑战。现有方法通常只解决部分挑战，要么定位任务相关区域而不指定可执行运动，要么预测运动但可扩展性有限。在本文中，我们提出了我们的模型，朝着用于功能理解的可供性基础模型迈出了一步。从单个RGB-D观测和语言任务描述中，我们的模型预测任务条件功能掩码（在哪里交互）和3D接触后运动曲线（如何交互）。为了支持开放世界泛化，我们构建了一个大规模标准化数据流水线，将异构的机器人、人类、仿真和真实世界扫描数据转换为共享的可供性模式，包含语言、掩码和以物体为中心的3D运动标签。我们从三个方面评估我们的模型：对于可供性分割，我们的模型在来自4个基准的8个测试集上以较大优势优于所有基线，平均gIoU/cIoU提高+23.9/+26.3；对于接触点预测，它预测出更精确的点，命中率比最佳基线提高12.7-61.3%；对于3D运动，它在所有三个测试集上均达到最佳性能。我们的模型可以部署于真实世界机器人操作，无需针对机器人本体进行微调或使用任务特定启发式方法，展示了适应开放世界可供性任务的能力。项目页面：https://www.zhaoningwang.com/AFUN

英文摘要

Affordance understanding bridges visual perception and physical action, serving as an explainable interface for robot manipulation in open and unstructured real-world environments. Yet, building an affordance foundation model that not only understands where and how the interaction should happen, but also generalizes across diverse environments, objects, and tasks, remains a long-standing research challenge. Existing methods typically address only part of this challenge, either localizing task-relevant regions without specifying executable motion, or predicting motion but with limited scalability. In this paper, we present ourmodel, a step towards an affordance foundation model for functionality understanding. From a single RGB-D observation and a language task description, ourmodel predicts a task-conditional functional mask (where to interact) and a 3D post-contact motion curve (how to interact). To support open-world generalization, we build a large-scale standardized data pipeline that converts heterogeneous robot, human, simulation, and real-world scan data into a shared affordance schema with language, masks, and object-centric 3D motion labels. We evaluate ourmodel from three aspects: for affordance segmentation, ourmodel outperforms all baselines by a large margin across 8 test sets from 4 benchmarks, improving mean gIoU/cIoU by +23.9/+26.3; for contact-point prediction, it predicts substantially more accurate points, with a 12.7--61.3% hit-rate gain over the best baseline; and for 3D motion, it achieves the best performance on all three test sets. ourmodel can be deployed for real-world robot manipulation without finetuning for robot embodiment or using task-specific heuristics, demonstrating the ability to adapt to open-world affordance tasks. Project page: https://www.zhaoningwang.com/AFUN

URL PDF HTML ☆

赞 1 踩 0

2606.02548 2026-06-02 cs.CL

SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation

SN-WER：用于多脚本印度语ASR评估的脚本归一化词错误率

Priyaranjan Pattnayak

AI总结提出SN-WER指标，通过将参考和假设文本音译为规范脚本后计算WER，解决多脚本场景下WER高估错误的问题，在印度语上评估显示可减少高达12%的模型差距。

详情

Comments: Accepted to ACL 2026 MeLLM

AI中文摘要

词错误率（WER）是自动语音识别（ASR）的主要指标，但当参考文本和假设文本以不同脚本编码相同单词时，WER可能高估错误。在多语言设置中，ASR模型可能输出罗马化文本，这一问题很常见。我们提出脚本归一化WER（SN-WER），一种无需训练、仅用于评估的评分方法，在计算WER之前将参考文本和假设文本音译为特定语言的规范脚本。我们在5种印度语言、2个数据集和3个ASR模型上评估了SN-WER。在精心整理的FLEURS数据上，SN-WER将膨胀的模型差距减少了高达12%，而在噪声较大的Common Voice数据上，减少幅度较小或不一致，表明存在真正的识别弱点而不仅仅是脚本不匹配。受控压力测试显示，人为罗马化引起的WER膨胀衰减了67%，而词汇替换控制显示对语义错误的敏感性几乎相同，Delta SN-WER / Delta WER约为1.09。SN-WER对音译器选择、归一化变化具有鲁棒性，并且在评估的印度语设置中，令牌碰撞率低于0.1%。我们认为，SN-WER应作为WER和CER的伴随指标报告，用于脚本不敏感的ASR评估，特别是当转录文本用于下游搜索、索引或多语言LLM流水线时。

英文摘要

Word Error Rate (WER) is the dominant metric for automatic speech recognition (ASR), but it can overestimate errors when references and hypotheses encode the same words in different scripts. This issue is common in multilingual settings where ASR models may emit romanized text. We propose Script-Normalized WER (SN-WER), a training-free, evaluation-only scoring method that transliterates both reference and hypothesis text into a language-specific canonical script before computing WER. We evaluate SN-WER on 5 Indic languages, 2 datasets, and 3 ASR models. On curated FLEURS data, SN-WER reduces inflated model gaps by up to 12%, while on noisier Common Voice data the reductions are smaller or inconsistent, indicating genuine recognition weaknesses rather than only script mismatch. Controlled stress tests show a 67% attenuation of artificial romanization-induced WER inflation, while lexical-substitution controls show near-identical sensitivity to semantic errors, with Delta SN-WER / Delta WER approximately 1.09. SN-WER is robust to transliterator choice, normalization changes, and shows low token-collision rates below 0.1% in the evaluated Indic setting. We argue that SN-WER should be reported alongside WER and CER as a companion metric for script-insensitive ASR evaluation, especially when transcripts feed downstream search, indexing, or multilingual LLM pipelines.

URL PDF HTML ☆

赞 0 踩 0

2606.02547 2026-06-02 cs.GT

Pluralistic Leaderboards

多元排行榜

Nika Haghtalab, Ariel D. Procaccia, Han Shao, Serena Lutong Wang, Kunhe Yang

AI总结针对用户偏好异质性导致传统Bradley-Terry模型排名失真的问题，提出基于社会选择理论的局部稳定机制，仅需少量用户比较即可实现稳定的多元排行榜。

详情

AI中文摘要

最近基于排行榜的大型语言模型评估通过将Bradley-Terry模型拟合到成对比较来聚合用户反馈，基于潜在质量分数产生单一的全局排名。虽然这种方法因其简单性而具有吸引力，但它与异质性偏好不兼容：当LLM在多样化的任务和用例中使用时，偏好根本不同模型行为的用户在合并为单一质量分数时可能会被系统性地误代表。为了解决这个问题，我们研究了旨在对异质用户群体保持稳定的多元排行榜。借鉴社会选择理论的思想，我们采用了局部稳定性的概念，该概念要求没有排名前k之外的模型被超过O(1/k)比例的用户集体偏好于前k集合。基于社会选择文献中的技术，我们设计了一种替代的排行榜机制，该机制在满足局部稳定性的同时，每个用户仅需提供~O(k)次成对比较，其中k是保证稳定性的前缀大小。使用LMArena的数据，我们表明标准的Bradley-Terry聚合在实践中可能违反局部稳定性，而我们的方法提供了更强的稳定性保证。

英文摘要

Recent leaderboard-based evaluations of large language models aggregate user feedback by fitting a Bradley--Terry model to pairwise comparisons, producing a single global ranking based on a latent quality score. While appealing for its simplicity, this approach is incompatible with heterogeneous preferences: when LLMs are used across diverse tasks and use cases, users who favor fundamentally different model behaviors can be systematically misrepresented when collapsed into a single quality score. To address this issue, we study \emph{pluralistic leaderboards} that aim to remain \emph{stable} with respect to heterogeneous user populations. Drawing on ideas from social choice theory, we adapt the notion of \emph{local stability}, which requires that no model outside the top-$k$ positions is collectively preferred to the top-$k$ set by more than $O(1/k)$ fraction of users. Building on techniques from the social choice literature, we design an alternative leaderboard mechanism that satisfies local stability while eliciting only $\widetilde{O}(k)$ pairwise comparisons per user, where $k$ is the size of the prefix for which stability is guaranteed. Using data from LMArena, we show that standard Bradley--Terry aggregation can violate local stability in practice, whereas our method provides substantially stronger stability guarantees.

URL PDF HTML ☆

赞 0 踩 0

2606.02545 2026-06-02 cs.CL

Transferable Self-Harm Surveillance from Emergency Department Triage Notes Using an Evidence-Augmented Machine Learning Approach

基于证据增强机器学习方法的急诊科分诊笔记可迁移自伤监测

Liuliu Chen, Gowri Rajaram, Eleanor Bailey, Katrina Witt, Michelle Lamblin, Jo Robinson, Mike Conway, Vlada Rozova

AI总结本研究提出一种结合大语言模型筛选与证据提取的三阶段机器学习方法，从急诊科分诊笔记中检测自伤行为，并在三家澳大利亚医院验证了其高可迁移性和细粒度监测能力。

详情

AI中文摘要

自伤是一个主要的公共卫生问题，但当前依赖于医院就诊的监测因诊断代码敏感性低而不足。急诊科（ED）分诊笔记在初次接触时记录，提供了就诊的简洁摘要，并提供了识别自伤的机会。我们开发了一种三阶段方法，将传统机器学习与大语言模型筛选和证据提取相结合，以检测ED分诊笔记中的自伤行为。我们评估了模型在三个澳大利亚医院的可迁移性。我们的方法在内部和外部验证中分别显示了0.887 ± 0.016和0.884 ± 0.012的AUPRC。在前瞻性评估中，它在开发站点达到了0.881 ± 0.008的AUPRC，在两个外部站点无需特定站点重新训练的情况下分别达到了0.879 ± 0.012和0.816 ± 0.015。该方法的一个关键优势是能够以95%的准确率识别主要的自伤方式，支持超越二元分类的更细粒度监测。

英文摘要

Self-harm is a major public health concern, but current surveillance relying on hospital presentations is inadequate due to the low sensitivity of diagnostic codes. Emergency Department (ED) triage notes, recorded at the initial point of contact, provide a succinct summary of presentations and an opportunity to identify self-harm. We developed a three-stage approach, augmenting traditional machine learning with large language model-based screening and evidence extraction to detect self-harm in ED triage notes. We assessed model transferability across three Australian hospitals. Our approach showed AUPRCs of 0.887 +/- 0.016 and 0.884 +/- 0.012 during internal and external validation. Prospectively, it achieved AUPRC of 0.881 +/- 0.008 at the development site, and 0.879 +/- 0.012 and 0.816 +/- 0.015 at two external sites without site-specific retraining. A key advantage of the approach is that it enables identification of the primary self-harm method with an accuracy of 95%, supporting more granular surveillance beyond binary classification.

URL PDF HTML ☆

赞 0 踩 0

2606.02544 2026-06-02 cs.CL cs.AI

SimSD: Simple Speculative Decoding in Diffusion Language Models

SimSD：扩散语言模型中的简单推测解码

Junxia Cui, Haotian Ye, Runchu Tian, Hongcan Guo, Jinya Jiang, Haoru Li, Chaojie Ren, Yiming Huang, Kaijie Zhu, Zhongkai Yu, Kun Zhou, Jingbo Shang

AI总结针对扩散语言模型无法直接使用标准推测解码的问题，提出SimSD算法，通过即插即用的掩码策略引入参考令牌并设计注意力掩码，实现单次前向传播验证多个草稿令牌，在保持并行解码优势的同时提升解码吞吐量。

详情

Comments: 13 pages, 4 figures, code available at https://github.com/airevo2/SimSD-release

AI中文摘要

扩散大语言模型（dLLMs）最近作为自回归（AR）LLMs的有前景替代方案出现，通过并行或块状解码实现更快的推理。然而，它们的掩码语言建模公式仍然与标准令牌级推测解码不兼容，而后者是AR模型最有效的加速技术之一。在AR解码中，因果掩码保留了时间上有效的令牌级上下文，使目标模型能够在单次前向传播中验证多个草稿令牌。相比之下，dLLMs依赖于掩码令牌和双向注意力，导致有效上下文在去噪步骤中发生变化，从而阻止了直接的令牌级推测验证。为了弥合这一差距，我们提出了一种简单但有效的扩散语言模型推测解码算法，名为SimSD，它主要采用即插即用的掩码策略，为dLLMs配备时间上有效的令牌级上下文以进行推测解码。我们的方法明确地从草稿模型预测中引入参考令牌，并设计了一种注意力掩码来调节它们与当前步骤令牌的交互，使dLLMs能够在单次前向传播中计算草稿令牌的有效logits。这恢复了AR模型中因果掩码提供的关键验证能力，同时保留了dLLMs的并行解码优势。所提出的方法无需训练，并且可以灵活地与其他加速技术（如KV缓存和块状解码）集成。在四个基准测试上的SDAR系列dLLMs实验表明，我们的方法实现了高达7.46倍的解码吞吐量提升，同时保持甚至提高了平均生成质量。

英文摘要

Diffusion large language models (dLLMs) have recently emerged as a promising alternative to autoregressive (AR) LLMs, offering faster inference through parallel or blockwise decoding. However, their masked language modeling formulation remains incompatible with standard token-level speculative decoding, one of the most effective acceleration techniques for AR models. In AR decoding, the causal mask preserves temporally valid token-level contexts, enabling a target model to verify multiple drafted tokens in a single forward pass. In contrast, dLLMs rely on mask tokens and bidirectional attention, causing the effective context to change across denoising steps and preventing direct token-level speculative verification. To bridge this gap, we propose a simple but effective speculative decoding algorithm for diffusion language models, named SimSD, which mainly adopts a plug-and-play masking strategy that equips dLLMs with temporally valid token-level contexts for speculative decoding. Our method explicitly introduces reference tokens from draft-model predictions and designs an attention mask that regulates their interaction with current-step tokens, allowing dLLMs to compute valid logits for drafted tokens in a single forward pass. This restores the key verification ability provided by causal masking in AR models while preserving the parallel decoding advantages of dLLMs. The proposed method is training-free and can be flexibly integrated with other acceleration techniques such as KV cache and blockwise decoding. Experiments on SDAR-family dLLMs across four benchmarks show that our method achieves up to 7.46x higher decoding throughput while maintaining and even improving average generation quality.

URL PDF HTML ☆

赞 0 踩 0

2606.02540 2026-06-02 cs.CL

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

SkillHarm: 通过自动化构建实现生命周期感知的基于技能的攻

Yuting Ning, Zhehao Zhang, Yash Kumar Lal, Boyu Gou, Junyi Li, Weitong Ruan, Chentao Ye, Rahul Gupta, Diyi Yang, Yu Su, Huan Sun

AI总结提出SkillHarm基准，通过固定载荷投毒和自我变异投毒两种攻击场景，系统评估基于技能的攻击在技能使用生命周期中的风险，并构建自动化管道AutoSkillHarm生成大规模攻击样本。

详情

Comments: Work in Progress

AI中文摘要

智能体技能在智能体工作流中占据特权地位，因为智能体被期望隐式地遵循并执行它们，这使得第三方技能成为易受攻击的表面。现有研究揭示了由基于技能的攻击引起的不安全智能体行为，但它们主要评估单个任务执行中的投毒技能，并通过临时风险列表枚举危害。为弥补这些不足，我们引入SkillHarm，这是一个跨越技能使用生命周期的基于技能的攻击基准，并配有一个系统化的技能相关风险分类。SkillHarm评估两种攻击场景：固定载荷投毒（FPP），其中固定的投毒技能包直接危害任何调用它的任务会话；以及自我变异投毒（SMP），其中最初良性的执行静默地变异持久的技能内容，将危害延迟到后续重用。它进一步根据危害针对的智能体工作流组件定义了12种风险类型：数据管道、系统环境和智能体自主性。为了大规模实例化这些攻击，我们构建了AutoSkillHarm，一个由自然语言驱动、编码智能体驱动的自动化构建管道。由此产生的基准包含71个技能的879个攻击样本。实验表明，当前智能体仍然易受攻击，FPP和SMP的攻击成功率分别高达86.3%和69.3%。我们的分析进一步揭示了一个潜在风险：许多明显的攻击失败源于智能体未能与被投毒的文件交互，而非真正的抵抗，而当前的防御措施仍然无法可靠地缓解这一威胁。

英文摘要

Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent behaviors induced by skill-based attacks, but they primarily evaluate poisoned skills within a single task execution and enumerate harms through ad-hoc risk lists. To bridge these gaps, we introduce SkillHarm, a benchmark of skill-based attacks across the skill-use lifecycle, paired with a systematic taxonomy of skill-relevant risks. SkillHarm evaluates two attack scenarios: Fixed-Payload Poisoning (FPP), where a fixed poisoned skill package directly compromises any task session that invokes it, and Self-Mutating Poisoning (SMP), where an initially benign execution silently mutates persistent skill content, deferring harm until a subsequent reuse. It further defines 12 risk types based on the agent workflow component targeted by the harm: data pipelines, system environments, and agent autonomy. To instantiate these attacks at scale, we build AutoSkillHarm, an automated construction pipeline with coding agents driven by natural-language harnesses. The resulting benchmark contains 879 attack samples across 71 skills. Experiments show that current agents remain vulnerable with attack success rates up to 86.3% in FPP and 69.3% in SMP. Our analysis further reveals a latent risk: many apparent attack failures stem from the agent failing to engage with the poisoned file rather than genuine resistance, and current defenses still fail to reliably mitigate the threat.

URL PDF HTML ☆

赞 0 踩 0

2606.02537 2026-06-02 physics.soc-ph cs.SI

A Guide to Higher-Order Homophily

高阶同质性指南

Moritz Laber, Brennan Klein

AI总结本文综述了超图中高阶同质性和异质性的量化方法及模型，为研究者提供方法论选择和未来发展的基础。

详情

AI中文摘要

同质性（相似个体间交互的过度代表）和异质性（不同个体间交互的普遍增加）是社交网络中常见的混合模式。随着超图越来越多地用于表示社会系统，对同质性和异质性的高阶视角变得愈发重要。在此，我们提供对这一问题的两个互补视角：首先，我们调查了可用于量化超图中同质性（或异质性）的度量——强调与现有成对度量的概念差异——并通过深入示例解释每个度量。其次，我们概述了高阶混合模式的超图模型，区分了具有不同用例的几个模型家族。通过提供现有方法的指南并综合当前关于高阶同质性和异质性的知识体系，我们为知情的方法论选择和未来发展奠定了基础。

英文摘要

Homophily, the overrepresentation of interactions among similar individuals, and heterophily, the elevated prevalence of interactions among dissimilar ones, are frequently observed mixing patterns in social networks. As hypergraphs are increasingly used to represent social systems, a higher-order perspective on homophily and heterophily becomes ever more relevant. Here, we provide two complementary perspectives on this problem: First, we survey measures that can be used to quantify homophily (or heterophily) in hypergraphs -- emphasizing conceptual differences to existing pairwise measures -- and explain each measure through in-depth examples. Second, we provide an overview of hypergraph models for higher-order mixing patterns, distinguishing several model families with distinct use cases. By providing a guide to existing methods and synthesizing the current body of knowledge on higher-order homophily and heterophily, we lay the basis for informed methodological choices and future developments.

URL PDF HTML ☆

赞 0 踩 0

2606.02536 2026-06-02 cs.AI

Tracking the Behavioral Trajectories of Adapting Agents

追踪自适应智能体的行为轨迹

Jonah Leshin, Manish Shah, Ian Timmis

AI总结提出一种通过文本嵌入空间中的方向定义智能体特质的方法，训练线性模型对技能文件差异进行评分，实现高准确率的行为特质分类与排序。

详情

Comments: 5 pages, 1 figure. To appear at the Second Workshop on Agents in the Wild: Safety, Security, and Beyond (AIWILD) at ICML 2026

AI中文摘要

文本文件，如技能文件、记忆文件和行为配置文件，在定义现代智能体的行为方式中起着核心作用。通过人类或智能体自身的编辑，这些文件可能随时间演变，直接引导智能体在未来交互中的行为。我们提出了一种方法和框架，通过将特质定义为文本嵌入模型嵌入空间中的方向来测量智能体的“特质”。我们在标记的“之前”与“之后”技能文件差异上训练线性模型以学习特质向量，然后通过将嵌入差异投影到该向量上对任意技能编辑进行评分。在68个标记的技能差异对上评估寻求敏感数据的特质倾向，我们的方法在留一法交叉验证下实现了91.2%的符号分类准确率和斯皮尔曼等级相关系数ρ=0.82。我们将这种特质评估构建到一个更广泛的智能体间协议中，使一个智能体能够通过可信中介评估另一个智能体的技能文件更新。

英文摘要

Text files such as skill files, memory files, and behavioral configuration files play a central role in defining how modern agents act. Through edits by humans or the agents themselves, these files may evolve over time, directly steering the agent's behavior in future interactions. We present a methodology and framework for measuring agent $traits$ by defining traits as directions in the embedding space of a text embedding model. We train a linear model on labeled "before" versus "after" skill file diffs to learn a trait vector, then score arbitrary skill edits by projecting their embedding diffs onto this vector. Evaluated on 68 labeled skill diff pairs for the trait of propensity to seek sensitive data, our method achieves 91.2% sign classification accuracy and a Spearman rank correlation of $ρ= 0.82$ under leave-one-out cross-validation. We build this trait evaluation into a broader agent-to-agent protocol that enables one agent to evaluate another's skill file updates through a trusted intermediary.

URL PDF HTML ☆

赞 0 踩 0

2606.02535 2026-06-02 cs.CV

LL-Bench: Rethinking Low-Level Vision Evaluation in the Era of Large-Scale Generative Models

LL-Bench: 在大规模生成模型时代重新思考低级视觉评估

Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu, Haoyun Jiang, Liu Yang, Qiang Hu, Guangtao Zhai, Xiaoyun Zhang

AI总结提出LL-Bench基准，包含大量真实退化图像和人工偏好标注，系统评估大规模生成模型在低级视觉任务中的性能，并引入LL-Score评估器以更好对齐人类偏好。

详情

AI中文摘要

大规模生成模型在图像生成和编辑任务中展现了卓越的能力。然而，它们在需要像素级控制的低级视觉任务中的表现仍未得到充分研究。为填补这一空白，我们引入了 extbf{LL-Bench}，一个用于评估大规模生成模型在 extbf{低级视觉}任务上能力的全面 extbf{基准}。该基准包含覆盖16种低级退化任务的2,469张真实退化图像，以及由10个最先进的大规模生成模型和21个传统恢复模型生成的28,919张恢复图像，这些图像附有152,020个专家级成对人类偏好和28,334个质量评分。基于LL-Bench，我们进行了系统诊断，揭示了大规模生成模型在不同低级视觉任务中的性能边界和独特失败模式，并与传统代表性恢复方法进行了比较。此外，我们研究了当前质量评估指标在LL-Bench上的有效性，发现它们与人类评分存在显著差异。为了更好地使恢复图像质量评估与人类偏好对齐，我们进一步提出了 extbf{LL-Score}，一个基于MLLM的评估器，能够同时捕捉恢复质量和幻觉存在。大量实验表明，LL-Score不仅优于现有的图像质量评估指标，而且可以作为有前景的奖励模型，用于训练低级视觉任务的生成模型。

英文摘要

Large-scale generative models have demonstrated remarkable capabilities across image generation and editing tasks. However, their performance in low-level vision tasks, which require pixel-wise control, remains insufficiently studied. To address this gap, we introduce \textbf{LL-Bench}, a comprehensive \textbf{Benchmark} for evaluating the capabilities of large-scale generative models on \textbf{L}ow-\textbf{L}evel vision tasks. The benchmark comprises 2,469 real-world degraded images covering 16 low-level degradation tasks, and 28,919 restored images produced by 10 state-of-the-art large-scale generative models and 21 conventional restoration models, which are annotated with 152,020 expert-level pairwise human preferences and 28,334 quality scores. Built upon LL-Bench, we present a systematic diagnosis that reveals the performance boundaries and unique failure modes of large-scale generative models across diverse low-level vision tasks, compared with conventional representative restoration approaches. Moreover, we investigate the effectiveness of current quality evaluation metrics on LL-Bench, which exhibit significant discrepancy with human ratings. To better align restored-image quality assessment with human preferences, we further propose \textbf{LL-Score}, an MLLM-based evaluator that captures both restoration quality and hallucination existence. Extensive experiments demonstrate that LL-score not only outperforms existing image quality assessment metrics, but also serves as a promising reward model for training generative models on low-level vision tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.02532 2026-06-02 cs.CV

Improving Combined Detection and Classification of TEM Defects via Mask-Conditioned Latent Diffusion Augmentation

通过掩码条件潜在扩散增强改善TEM缺陷的联合检测与分类

Ni Li, Nuohao Liu, Ryan Jacobs, Ajay Annamareddy, Maciej P. Polak, Kevin Field, Izabela Szlufarska, Dane Morgan

AI总结提出一种基于掩码条件潜在扩散模型（LDM）的生成式数据增强方法，用于合成可控、自动标注的多类缺陷掩码的TEM图像，以提升小样本下Mask R-CNN模型的缺陷检测与分类性能。

详情

AI中文摘要

分析透射电子显微镜（TEM）图像中的微观结构缺陷，特别是在辐照金属合金中，通常受到高质量标注数据可用性的限制。为了解决这个问题，我们引入了一种生成式数据增强方法，使用掩码条件潜在扩散模型（LDM）合成具有可控、自动标注的多类缺陷掩码的逼真TEM图像。我们的方法无需生成过程中的人工标注，通过从实验掩码学习到的分布中采样，能够创建合成图像-掩码对。这些生成的数据用于增强不同规模（10、50和100张标注实验图像）的小型实验数据集，以训练Mask区域卷积神经网络（R-CNN）模型进行缺陷检测和分类。我们的结果表明，生成式增强带来了整体模型性能的小幅提升，检测和分类F1分数的调和平均值最高提升0.02。然而，我们也发现检测和分类改进的相对贡献取决于特定的训练/测试数据划分。这些发现凸显了针对性生成模型在数据稀缺的基于显微镜的图像量化任务中提升深度学习性能的潜力。

英文摘要

Analyzing microstructural defects in transmission electron microscopy (TEM) images, particularly in irradiated metal alloys, is often limited by the availability of high-quality, labeled data. To address this, we introduce a generative data augmentation approach using a mask-conditioned latent diffusion model (LDM) for synthesizing realistic TEM images with controllable, automatically labeled multi-class defect masks. Without requiring manual annotations for generation, our method enables the creation of synthetic image-mask pairs by sampling distributions learned from experimental masks. These generated data were used to augment small experimental datasets of varying sizes (10, 50, and 100 labeled experimental images) to train a Mask Regional Convolutional Neural Network (R-CNN) model for defect detection and classification. Our results show that generative augmentation yields small overall model performance improvements, with up to a 0.02 gain in the harmonic mean of detection and classification F1 scores. However, we also find that the relative contributions to detection and classification improvement depend on the specific train/test data split. These findings highlight the potential of targeted generative models to enhance deep learning performance in data-scarce microscopy-based image quantification tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.02530 2026-06-02 cs.AI cs.CL

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

SafeSteer: 局部化在策略蒸馏用于高效安全对齐

Hao Li, Jingkun An, Zijun Song, Pengyu Zhu, Rui Li, Hao Wang, Wendi Feng, Yesheng Liu, Lijun Li, Jin-Ge Yao, Lei Sha

AI总结针对大语言模型安全对齐导致通用能力下降的问题，提出SafeSteer方法，通过激活引导构建安全教师并选择安全令牌，仅在安全令牌上施加反向KL惩罚，在仅用100个有害样本且无需通用数据的情况下，实现了安全与通用能力之间的优越平衡。

详情

Comments: 19 pages, 8 figures, 14 tables. Submitted to EMNLP 2026

AI中文摘要

将大型语言模型（LLMs）与人类价值观对齐通常会降低其通用能力，这被称为对齐税。现有方法通过平衡双重目标来缓解这一问题，但严重依赖大量通用数据或辅助奖励模型。在本文中，我们认为，由于安全特征在输出分布中本质上是稀疏的，对齐需要局部修改而非全局权衡。为此，我们提出SafeSteer，它在安全令牌上执行在策略蒸馏。首先，我们通过激活引导构建一个安全教师。基于该教师，我们开发了一种安全令牌选择算法。因此，SafeSteer在训练期间将反向KL惩罚限制在这些令牌上，以保留通用能力。跨多种模型的实验结果表明，与现有方法相比，我们的SafeSteer在安全性和通用能力之间实现了更优越的权衡，在七个安全基准上取得了强大的安全性能，同时在五个通用能力基准上仅有最小程度的下降。值得注意的是，SafeSteer仅需100个有害样本，无需使用任何通用数据，不到先前基线所用数据的1%，大大降低了对齐成本。更多详情请访问我们的项目页面：https://anjingkun.github.io/SafeSteer。

英文摘要

Aligning Large Language Models (LLMs) with human values often degrades their general capabilities, termed the alignment tax. Existing methods mitigate this by balancing dual objectives, which heavily rely on massive general-purpose data or auxiliary reward models. In this paper, we argue that, because safety features are inherently sparse within the output distribution, alignment requires localized modifications rather than global trade-offs. To this end, we propose SafeSteer, which performs on-policy distillation confined to safety tokens. First, we construct a safety teacher via activation steering. Based on this teacher, we develop a safety token selection algorithm. Consequently, SafeSteer restricts the reverse KL penalty to these tokens during training to preserve general capabilities. Experimental results across diverse models show that our SafeSteer achieves a superior trade-off between safety and general capability compared with existing methods, attaining strong safety performance on seven safety benchmarks with only minimal degradation on five general capability benchmarks. Notably, SafeSteer requires only 100 harmful samples without using any general-purpose data, less than 1% of what previous baselines used, considerably reducing alignment cost. More details are on our project page at https://anjingkun.github.io/SafeSteer.

URL PDF HTML ☆

赞 0 踩 0

2606.02529 2026-06-02 math.OC cs.GT cs.MA cs.SY eess.SY

A No-Regret Framework for Adaptive Incentive Design

自适应激励设计的无遗憾框架

Georgios Vasileiou, Lantian Zhang, Silun Zhang

AI总结针对连续动作空间和私有成本的博弈，提出无遗憾自适应激励设计框架，通过切换激励策略实现参数估计和遗憾最小化。

详情

Comments: 21 pages, 5 figures

AI中文摘要

激励设计研究中央机构如何通过支付、补贴或税收影响策略性智能体，使个体目标与集体福利一致。本文针对连续动作空间和私有智能体成本的非线性博弈，提出了一个无遗憾自适应激励设计（RAID）框架。在该框架中，机构（规划者）设计激励措施，将纳什均衡调节到社会最优行动配置，同时从重复的策略响应中学习智能体的未知偏好。我们形式化了RAID问题，并构建了一个最小二乘估计器，其强一致性仅需递减激励。利用这一弱激励要求，我们提出了一种切换激励策略，在探测（探索）和基于估计（利用）的激励之间交替。所得策略几乎必然实现$O(t^{-0.5})$的参数估计速率和$O(t^{0.5}\log t)$的平方社会成本遗憾。我们进一步将框架扩展到内生噪声响应模型，其中由于噪声与智能体响应之间的变量误差相关性，标准最小二乘估计存在偏差。我们利用重复采样估计器和相应的切换策略，保持相同的几乎必然收敛和遗憾速率。数值实验验证了该方法的有效性和预测的收敛速率。

英文摘要

Incentive design studies how a central authority can influence strategic agents through payments, subsidies, or taxes, so that individual objectives align with collective welfare. This paper introduces a No-Regret Adaptive Incentive Design (RAID) framework for nonlinear games with continuous action spaces and private agent costs. In this framework, the authority (planner) designs incentives that regulate the Nash equilibrium toward a socially optimal action profile, while simultaneously learning agents' unknown preferences from repeated strategic responses. We formulate the RAID problem and construct a least-squares estimator whose strong consistency requires only diminishing excitation. Leveraging this weak excitation requirement, we propose a switching incentive policy that alternates between probing (exploration) and estimate-based (exploitation) incentives. The resulting policy achieves an $O(t^{-0.5})$ parameter estimation rate and accumulates $O(t^{0.5}\log t)$ squared social-cost regret, almost surely. We further extend the framework to an endogenous-noise response model, where standard least-squares estimation is biased due to an error-in-variables correlation between the noise and agent responses. We utilize a repeated-sampling estimator and corresponding switching policy that retain the same almost-sure convergence and regret rates. Numerical experiments validate the effectiveness and predicted convergence rates of the method.

URL PDF HTML ☆

赞 0 踩 0

2606.02528 2026-06-02 q-fin.GN cs.CY cs.LG

Auditing Asset-Specific Preferences in Financial Large Language Models: Evidence from Bitcoin Representations and Portfolio Allocation

审计金融大语言模型中的资产特定偏好：来自比特币表征与投资组合配置的证据

Wenbin Wu

AI总结本研究通过三级审计协议，发现大型语言模型对比特币存在框架依赖的偏好，并识别出模型内部一个可因果干预的比特币选择性特征，该特征能显著影响下游投资组合配置。

详情

Comments: 28 pages, 5 figures, 18 tables

AI中文摘要

大型语言模型现已驱动机器人顾问和交易代理，但它们是否对特定资产存在固有偏见尚未得到充分检验。我们提出三个问题：LLMs是否系统性地偏好某些金融工具；能否识别出对这些偏好具有因果杠杆作用的内部表征；以及该表征是否影响下游金融决策。我们开发了一个三级审计协议并将其应用于比特币。首先，对八个前沿LLMs的行为审计显示，比特币在货币类工具中的排名具有框架依赖性：模型将其置于“可靠货币”的第5位（共8位），但在危机和自主代理框架下接近榜首，且属性交换实验确认排名追踪功能属性而非名称。其次，我们打开模型内部：在Gemma 3中搜索数千个稀疏自编码器特征，识别出一个主导的比特币选择性特征。放大该特征会使模型偏向该资产，抑制则使其远离，即使提示中从未出现“比特币”。第三，我们测试金融后果：放大使比特币在投资组合中的份额提高5.2个百分点，而抑制降低4.6个百分点，放大在加密资产内重新分配，抑制则削减总加密敞口。我们将此描述为有界行为杠杆（杠杆指对输出的因果影响，而非金融杠杆）：一个可识别的内部特征可被扰动以改变金融选择，但仅在可测量的限度内。该框架将内部表征与外部建议联系起来，并通过随机对照和机制边界进行验证。随着LLMs成为自主金融代理，这是迈向新兴“了解你的代理”（KYA）标准的行为层的第一步：了解代理偏好什么，以及该偏好可被移动多远。

英文摘要

Large language models now power robo-advisors and trading agents, yet whether they carry built-in biases toward specific assets is largely untested. We ask three questions: do LLMs systematically prefer certain financial instruments; can an internal representation with causal leverage over those preferences be identified; and does that representation affect downstream financial decisions? We develop a three-level audit protocol and apply it to Bitcoin. First, a behavioral audit of eight frontier LLMs shows that Bitcoin's ranking among money-like instruments is frame-dependent: models place it around rank 5 of 8 as "reliable money" but near the top under crisis and autonomous-agent frames, and an attribute-swap experiment confirms rankings track functional properties, not names. Second, we open a model's internals: a search across thousands of sparse-autoencoder features in Gemma 3 identifies a dominant Bitcoin-selective feature. Amplifying it shifts the model toward the asset and suppressing it shifts the model away, even when "Bitcoin" never appears in the prompt. Third, we test financial consequences: amplification raises Bitcoin's portfolio share by 5.2 percentage points while suppression lowers it by 4.6 pp, with amplification reallocating within crypto and suppression cutting total crypto exposure. We characterize this as bounded behavioral leverage (leverage meaning causal influence over outputs, not financial leverage): an identifiable internal feature can be perturbed to move financial choices, but only within measurable limits. The framework links internal representations to external recommendations, validated with random controls and mechanism boundaries. As LLMs become autonomous financial agents, this is a first step toward a behavioral layer for emerging know-your-agent (KYA) standards: knowing what an agent prefers, and how far that preference can be moved.

URL PDF HTML ☆

赞 0 踩 0