arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.29678 2026-05-29 cs.CL

Spurious Prompts: Can Irrelevant Prompts Steer Large Language Models?

虚假提示:无关提示能否引导大型语言模型?

Pawel Batorski, Abtin Pourhadi, Jerzy Sarosiek, Przemyslaw Spurek, Paul Swoboda

AI总结 研究语义无关的提示(虚假提示)对大型语言模型行为的影响,提出黑盒搜索方法发现此类提示,并证明其在多个基准和模型上能显著影响模型输出。

详情
AI中文摘要

大型语言模型对提示高度敏感,但这种敏感性通常通过任务相关的指令、示例或推理线索来研究。本文研究了一种不同形式的提示敏感性:与任务语义无关的提示是否仍然能够引导模型行为。我们称其为虚假提示,并展示了其惊人的有效性。我们还提出了一种简单的黑盒搜索程序来发现它们。在推理和问答基准上,使用参数从0.8B到27B、涵盖三个模型家族的模型,我们展示了虚假提示可以提升性能,通常匹配或超越标准提示基线和任务感知的提示优化。我们进一步展示了它们可以引导模型产生非预期行为,例如重复选择第一个答案选项、产生错误答案、返回偶数、质数或小数,而无需明确指示模型这样做。这些发现揭示了一种新的提示敏感性:LLM可以被与它们被要求解决的任务无关的提示系统地引导。我们的代码可在 https://github.com/Batorskq/spurious 获取。

英文摘要

Large language models are highly sensitive to prompts, but this sensitivity is usually studied through task-relevant instructions, demonstrations, or reasoning cues. In this paper, we study a different form of prompt sensitivity: whether prompts that are semantically unrelated to the task can nevertheless steer model behavior. We call them spurious prompts and show their surprising efficacy. We also propose a simple black-box search procedure for discovering them. Across reasoning and question-answering benchmarks, using models ranging from 0.8B to 27B parameters and spanning three model families, we show that spurious prompts can improve performance, often matching or outperforming standard prompting baselines and task-aware prompt optimization. We further show that they can steer models toward unintended behaviors, such as repeatedly selecting the first answer option, producing incorrect answers, returning an even, prime or small number without explicitly instructing the model to do so. These findings reveal a new kind of prompt sensitivity: LLMs can be systematically steered by prompts that are unrelated to the task they are asked to solve. Our code is available at https://github.com/Batorskq/spurious

2605.29676 2026-05-29 cs.AI cs.CL

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

符号至关重要:智能体AI系统中令牌优化格式的基准研究

Lorenz Kutschka, Bernhard Geiger

AI总结 本研究在四个智能体基准上评估了两种令牌优化格式TOON和TRON,发现TRON在保持准确率的同时最多减少27%的令牌,而TOON虽减少18%但存在多轮解析失败和并行工具调用输出崩溃的问题。

详情
Comments
16 pages, 6 figures, 4 tables
AI中文摘要

智能体AI系统中的大型语言模型消耗工具模式和执行结果,并发出结构化数据的工具调用。这种交换的默认语言JSON是为应用间交换而非令牌效率设计的,因此其结构元素带来大量令牌开销。最近的工作提出了令牌优化替代方案,如TOON(令牌导向对象表示法)和TRON(令牌减少对象表示法)作为更紧凑的替代,但这些格式仅在孤立的理解或生成任务上进行了评估。它们在端到端智能体循环中是否保持令牌减少仍是一个开放问题。我们在四个智能体基准(BFCL、MCPToolBenchPP、MCP-Universe、StableToolBench)和五个开放权重LLM上评估了TOON和TRON,将输入压缩与输出压缩解耦,以独立测量理解和生成。TRON最多减少27%的令牌,准确率在JSON基线的14个百分点内。TOON实现了最多18%的减少,准确率成本类似为9个百分点,但在多轮解析失败上额外级联,并且对于大多数模型导致并行工具调用输出崩溃。

英文摘要

Large language models in Agentic AI systems consume tool schemas and execution results and emit tool invocations as structured data. The default language for that exchange, JSON, was designed for application-to-application interchange rather than token efficiency, so its structural elements impose substantial token overhead. Recent work proposes token-optimized alternatives such as TOON (Token-Oriented Object Notation) and TRON (Token Reduced Object Notation) as more compact replacements, but these formats have been evaluated only on isolated comprehension or generation tasks. Whether their token reductions hold inside end-to-end agentic loops therefore remains an open question. We evaluate TOON and TRON on four agentic benchmarks (BFCL, MCPToolBenchPP, MCP-Universe, StableToolBench) and five open-weight LLMs, decoupling input compression from output compression to measure comprehension and generation independently. TRON reduces tokens by up to 27% with accuracy within 14pp of the JSON baseline. TOON achieves up to 18% reduction at a similar 9pp accuracy cost, but additionally cascades on multi-turn parsing failures and collapses parallel tool-call output for most models.

2605.29675 2026-05-29 cs.HC cs.AI cs.IR

From Prompts to Context: An Ontology-Driven Framework for Human-Generative AI Collaboration

从提示到上下文:一种面向人类-生成式AI协作的本体驱动框架

Ngoc Luyen Le, Marie-Hélène Abel, Bertrand Laforge

AI总结 提出一种基于本体(CCAI)的框架,通过结构化建模任务、角色、资源和约束,将提示-响应交互转化为可查询的协作轨迹,以提升信息密集型工作流中的可追溯性和问责性。

详情
AI中文摘要

与生成式AI的协作通常始于简短提示,止于不透明输出,隐去了参与者、任务、资源及约束等关键信息。这种上下文显式性的缺失阻碍了信任、可追溯性和问责性,尤其在搜索、查询和档案管理等信息密集型工作流中。本文提出“从提示到上下文”这一本体驱动框架,用于表示人类-生成式AI协作。其核心组件——上下文协作AI本体(CCAI)——将任务、智能体角色、资源和约束等协作关键元素建模为共享的机器可解释词汇。通过将填充的CCAI实例与基于SPARQL的上下文检索相结合,该框架将原本短暂的提示-响应交互转化为结构化、可查询的协作轨迹,连接提示、输出及其周围上下文。通过一个软件开发团队构建基于能力的教育功能(用于查看和更新学习者能力档案)的案例研究,展示了该框架如何支持需求分析、设计、实现和测试阶段的协作片段表示与文档化。结果表明,显式协作建模有助于使任务上下文更清晰,提高AI生成贡献的可追溯性,并支持更透明、更负责任的人类-生成式AI实践。最后,我们提出了未来人类-生成式AI系统的设计原则,强调不仅关注输出质量,还要显式表示产生输出的协作上下文。

英文摘要

Collaborations with Generative AI often begin with a short prompt and end with an opaque output, leaving implicit who was involved, what task was being pursued, which resources were used, and which constraints should have shaped the process. This limited contextual explicitness hinders trust, traceability, and accountability, particularly when Generative AI is embedded in information-intensive workflows such as search, querying, and profile management. This paper introduces From Prompts to Context, an ontology-driven framework for representing Human-Generative AI collaboration. Its core component, the Contextual Collaboration AI Ontology (CCAI), models key elements of collaboration - including tasks, agent roles, resources, and constraints - as a shared machine-interpretable vocabulary. By combining populated CCAI instances with SPARQL-based context retrieval in operational workflows, the framework turns otherwise ephemeral prompt-response interactions into structured and queryable collaboration traces linking prompts, outputs, and their surrounding context. The approach is illustrated through a case study involving a software development team building a competency-based education feature for viewing and updating learner competency profiles. The case study shows how the framework can support the representation and documentation of collaboration episodes across requirements analysis, design, implementation, and testing. Within this setting, the results indicate that explicit collaboration modelling helps make task context more explicit, improves the traceability of AI-generated contributions, and supports more transparent and accountable Human-Generative AI practices. We conclude by outlining design principles for future Human-Generative AI systems that emphasise not only output quality, but also the explicit representation of the collaborative context in which outputs are produced.

2605.29673 2026-05-29 cs.LG cs.CV

A Geometric View of SRC: Learning Representations for Stable Residual Inference

SRC的几何视角:学习用于稳定残差推理的表示

Vangelis P. Oikonomou

AI总结 本文从几何角度分析稀疏表示分类(SRC)的残差排序稳定性,提出几何塑造目标以改善表示学习,并在多个数据集上验证了效果。

详情
Comments
37 pages
AI中文摘要

基于重构的推理通过比较类重构残差来分配类别;稀疏表示分类(SRC)是一个典型实例,其可靠性取决于学习表示的几何结构。我们采用严格的训练-推理分离:SRC仅作为固定的测试时规则使用,在训练过程中从不进行微分、展开或优化。在基于类条件张成子空间及其相关投影残差的张成子空间理想化中,我们通过残差间隔形式化残差排序稳定性,并刻画了可能在最坏方向破坏该间隔的几何障碍——张成子空间重叠、支配以及通过小主角产生的近重叠。这一张成子空间理论是首要的:它指定了理想化残差族何时良好分离,并为实际残差近似(如OMP)提供了条件性的求解器级解释,只要它们接近张成子空间级别的残差排序。在显式的覆盖和分离假设下,我们推导了(理想化)残差间隔的定量下界。在这些目标的指导下,我们提出了几何塑造目标,这些目标促进掩蔽的类内自表达性,抑制跨类重构路径和类间张成子空间对齐,并防止坍塌——而在训练过程中不调用SRC残差或预测。在图像(COIL-100)、文本(TREC)和EEG连接性上的实验,在相同的固定SRC/OMP推理下评估所有表示,并报告残差间隔和几何诊断;交叉熵仅作为相同评估协议下的参考几何包含在内。

英文摘要

Reconstruction-based inference assigns a class by comparing class-wise reconstruction residuals; Sparse Representation Classification (SRC) is a canonical instance whose reliability depends on the geometry of the learned representation. We adopt a strict training-inference separation: SRC is used only as a fixed test-time rule and is never differentiated, unrolled, or optimized during training. In a span-level idealization based on class-conditional spans and their associated projection residuals, we formalize residual-ordering stability through a residual margin and characterize geometric obstructions -- span overlap, dominance, and near-overlap via small principal angles -- that can collapse this margin in worst-case directions. This span-level theory is primary: it specifies when the idealized residual family is well-separated, and it provides a conditional solver-level interpretation for practical residual approximations (e.g., OMP) insofar as they remain close to the span-level residual ordering. Under explicit coverage and separation assumptions, we derive a quantitative lower bound on the (idealized) residual margin. Guided by these targets, we propose geometry-shaping objectives that promote masked within-class self-expressiveness, discourage cross-class reconstruction pathways and inter-class span alignment, and prevent collapse -- without invoking SRC residuals or predictions during training. Experiments on images (COIL-100), text (TREC), and EEG connectivity evaluate all representations under identical fixed SRC/OMP inference and report residual margins and geometric diagnostics; cross-entropy is included only as a reference geometry under the same evaluation protocol.

2605.29670 2026-05-29 cs.CL cs.AI

EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL

EviLink: 面向大规模Text-to-SQL的基于不确定性引导证据获取的多路径模式链接

Huawei Zheng, Sen Yang, Zhaorui Yang, Yuhui Zhang, Haozhe Feng, Haoxuan Li, Xuan Yi, Chao Hu, Defeng Xie, Chen Hou, Danqing Huang, Wei Chen, Yingcai Wu, Peng Chen, Dazhen Deng

AI总结 提出EviLink方法,通过多假设模式基础与不确定性引导的证据获取,重新定义模式链接为不确定性感知的模式需求推理,以平衡模式完整性、相关性和令牌成本,提升大规模Text-to-SQL性能。

详情
AI中文摘要

模式链接是大规模Text-to-SQL中困难且重要的步骤,系统必须从庞大且模糊的数据库中识别出紧凑且充分的模式上下文。现有方法通常将模式链接视为围绕单个SQL路径的确定性选择,但复杂问题可能允许多个具有不同模式需求的有效实现。我们将模式链接重新定义为对多个可行SQL路径的不确定性感知模式需求推理,其中系统区分必需模式项与路径依赖的不确定项,并仅在需要时获取证据。我们通过EviLink实例化这一重构,它结合了多假设模式基础与不确定性引导的证据获取。在BIRD-Dev和Spider2-Snow上的实验表明,这种视角改善了模式完整性、模式相关性和令牌成本之间的平衡。在Spider2-Snow上,EviLink实现了90.15%的字段级严格召回率,平均使用123.30K令牌,并在固定生成器下提升了下游SQL生成性能。

英文摘要

Schema linking is a difficult and important step in large-scale Text-to-SQL, where systems must identify a compact yet sufficient schema context from large and ambiguous databases. Existing methods often treat schema linking as deterministic selection around a single SQL path, but complex questions may admit multiple valid realizations with different schema needs. We reframe schema linking as uncertainty-aware schema-need inference over multiple plausible SQL paths, where the system distinguishes required schema items from path-dependent uncertain ones and acquires evidence only where needed. We instantiate this reframing with EviLink, which combines multi-hypothesis schema grounding with uncertainty-guided evidence acquisition. Experiments on BIRD-Dev and Spider2-Snow show that this perspective improves the balance among schema completeness, schema relevance, and token cost. On Spider2-Snow, EviLink achieves 90.15% field-level strict recall rate, uses 123.30K average tokens, and improves downstream SQL generation under a fixed generator.

2605.29669 2026-05-29 stat.ML cs.LG math.PR math.ST stat.TH

Eigen-Spike Emergence and Quadratic Equivalents for Conjugate Kernels on Nonlinearly Separable Data

Eigen-Spike 涌现与共轭核在非线性可分数据上的二次等价

Collin Cranston, Zhichao Wang, Todd Kemp, Michael W. Mahoney

AI总结 针对非线性可分数据(XOR问题),通过共轭核矩阵的二次等价模型,分析异常特征值涌现及其与标签对齐的BBP型相变,揭示样本复杂度、信噪比、激活函数和预训练特征对非线性可学习性的影响。

详情
Comments
89 pages, 10 figures
AI中文摘要

近期随机矩阵理论(RMT)工作发展了确定性等价的概念:通常是线性代理模型,用于近似大型非线性随机矩阵(如神经网络中的非线性特征映射)的谱行为。一方面,这些确定性等价通过将复杂模型简化为具有经典RMT工具特性的更简单模型,使理论预测易于处理。然而,这留下了一个问题:在处理高维非线性可分数据(例如对非线性可分数据进行分类)时,这种理想化的线性等价是否仍然有意义。受此启发,我们考虑前馈神经网络的非线性特征映射——共轭核(CK),在典型的非线性可分数据集XOR问题上;我们利用CK中信息性异常特征值的研究及其对应特征向量是否渐近与XOR标签对齐,作为非线性可学习性的代理。我们开发了尖峰CK矩阵的稳健二次等价,从而能够精确分析随着修改机器学习实践中常见的各种旋钮(样本复杂度、信噪比、非线性激活选择以及预训练特征)时涌现的信息性尖峰。在每种情况下,我们推导出精确的BBP型相变,其中通过CK特征向量的线性分类变得可能。我们的分析有助于将RMT中确定性等价工具的力量转化为研究机器学习中实际相关的问题。

英文摘要

Recent work in random matrix theory (RMT) has developed the notion of deterministic equivalents: typically linear surrogate models that approximate the spectral behavior of large nonlinear random matrices, such as nonlinear feature maps in neural networks (NNs). On the one hand, these deterministic equivalents make theoretical predictions tractable by reducing a complex model to a simpler model with properties that fall under the umbrella of classical RMT tools. However, this leaves open the question of whether this idealized linear equivalence remains meaningful when dealing with high-dimensional nonlinearly separable data, such as performing clssification on nonlinearly separable data. Motivated by this, we consider the conjugate kernel (CK), which is the nonlinear feature map of a feedforward NN, under a canonical nonlinearly separable dataset, the XOR problem; and we use the study of informative outlier eigenvalues in the CK and whether their corresponding eigenvectors asymptotically align with XOR labels as a proxy for nonlinear learnability. We develop a robust quadratic equivalent to the spiked CK matrix that enables a precise analysis of emergent informative spikes, as one modifies various knobs common in ML practice: sample complexity, signal-to-noise ratio (SNR), nonlinear activation choice, and pretrained features. In each of these scenarios, we derive a precise BBP-type phase transition in which linear classification via the CK eigenvectors becomes possible. Our analysis helps translate the power of deterministic equivalence tools in RMT to study problems of practical relevance in ML.

2605.29668 2026-05-29 cs.AI cs.CL

GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM Agents

GRASP: 门控回归感知技能提议器用于自我改进的LLM智能体

Johannes Moll, Jean-Philippe Corbeil, Jiazhen Pan, Martin Hadamitzky, Daniel Rueckert, Lisa Adams, Keno Bressem

AI总结 提出GRASP方法,通过门控回归感知技能库编辑,在硬回归预算下确保每次技能更新带来净改进,显著提升LLM智能体在结构化环境中的操作可靠性。

详情
AI中文摘要

在结构化环境中运行的LLM智能体以操作方式而非对话方式失败,其可靠性取决于对环境的程序性知识。先前的自我改进方法累积自然语言指导而不检查每个新项目是否保留先前正确的行为,因此修复一条轨迹的笔记可能静默地使另一条轨迹退化。我们引入GRASP(门控回归感知技能提议器),将智能体改进视为对有限技能库的一系列编辑,仅在候选技能在硬回归预算下对平衡的保留探针产生净改进时才接受它。我们在两个基于FHIR的临床基准上评估了GRASP在五个基础模型(gpt-oss-120b、DeepSeek V4 Flash、Gemini 3.1 Flash Lite、GPT-4.1、GPT-5.4)上的表现。在MedAgentBench上,GRASP将gpt-oss-120b从40.6%提升至88.8%,超过五个自我改进基线中最强的21.0个百分点,并将其他每个基础模型提升17.2至40.3个百分点。消融实验将增益归因于比较性提议生成、接受门和硬回归预算,而非技能编写本身——没有验证的技能编写并不比不使用技能更好。该机制泛化到临床领域之外,在四个非临床环境中的三个上改进了智能体,仅在动作空间开放的环境中保持持平。冻结的技能库可在模型间迁移,其中来自更强模型的技能将较弱执行者提升到超出其自身学习能力的水平,而反向则不然,这种不对称性是没有门控的基线无法复现的。

英文摘要

LLM agents acting in structured environments fail in operational rather than conversational ways, and reliability depends on procedural knowledge of the environment. Prior self-improvement methods accumulate natural-language guidance without checking that each new item preserves previously correct behavior, so a note that fixes one trajectory can silently regress another. We introduce GRASP (Gated Regression-Aware Skill Proposer), which treats agent improvement as a sequence of edits to a bounded skill library, admitting each candidate only if it produces a net improvement on a balanced held-out probe under a hard regression budget. We evaluate GRASP across five base models (gpt-oss-120b, DeepSeek V4 Flash, Gemini 3.1 Flash Lite, GPT-4.1, GPT-5.4) on two FHIR-based clinical benchmarks. On MedAgentBench, GRASP lifts gpt-oss-120b from 40.6% to 88.8%, exceeds the strongest of five self-improvement baselines by 21.0 points, and improves every other base model by 17.2 to 40.3 points. Ablations attribute the gain to comparative proposal generation, the acceptance gate, and the hard regression budget rather than to skill writing itself, which without validation is no better than using no skills. The mechanism generalizes beyond the clinical domain, improving agents on three of four non-clinical environments and remaining flat only where the action space is open-ended. Frozen libraries transfer across models, where skills from a stronger model improve weaker executors beyond what they learn for themselves while the reverse does not, an asymmetry that no ungated baseline reproduces.

2605.29667 2026-05-29 cs.CL

Beyond English and Evasion: A Human-Annotated Multi-Domain Benchmark for High-Stakes LLM Safety Evaluation in Chinese

超越英语与规避:用于高风险LLM中文安全评估的人工标注多领域基准

Wajdi Zaghouani, Kholoud K. Aldous, Yicheng Gao

AI总结 针对LLM在中文环境下安全系统失效的问题,构建了包含1,897个对抗性提示的人工标注基准ChiSafe-PAS,覆盖四个高风险领域,并提供完整标注以评估模型安全对齐。

详情
Journal ref
Proceedings of The fourth international workshop on the role of resources in the age of large language models RESOURCEFUL-2026 at LREC 2026, Palma de Mallorca, Spain, 2026
AI中文摘要

当大型语言模型(LLM)部署在中文环境中时,出现了一个令人不安的模式:在英语中运行良好的安全系统会失效。这些系统难以跨越语言和文化的界限,使得模型暴露于利用中文特定规避技术(包括拼音罗马化、汉字分解、网络俚语和模糊语气)的对抗性提示。为解决这一差距,我们引入了ChiSafe-PAS(中文安全试点标注集),这是一个包含1,897个对抗性中文提示的人工标注基准,涵盖四个高风险领域:自残与暴力、毒品与非法交易、欺诈以及讽刺。其中,1,544条条目带有完整的黄金标准标注:一个3类响应标签(拒绝、安全重定向、回应)、一个九类混淆分类、一个风险等级评级以及标注者理由。我们详细描述了数据集设计、标注过程和混淆分类。我们的主要目标是实用的:为研究社区提供一个高质量、基于文化背景的资源,用于基准测试LLM的安全对齐。在此过程中,我们涉及了该领域的三个更广泛的张力:训练数据和评估数据之间模糊的界限、基于现实风险进行领域覆盖的需求,以及规模作为文化专业知识替代品的局限性。

英文摘要

When Large Language Models (LLMs) are deployed in Chinese-language settings, a troubling pattern emerges: safety systems that work well in English break down. These systems struggle to cross linguistic and cultural bound-aries, leaving models exposed to adversarial prompts that exploit Chinese-specific evasion techniques, including Pinyin romanization, character decomposition, internet slang, and hedging tone. To address this gap, we introduce ChiSafe-PAS (Chinese Safety Pilot Annotation Set), a human-annotated benchmark of 1,897 adversarial Chinese prompts spanning four high-stakes domains: self-harm and violence, drug and illicit trade, fraud, and satire. Of these, 1,544 entries carry complete gold-standard annotations: a 3-class response label (REFUSE, SAFE-REDIRECT, RESPOND), a nine-category obfuscation taxonomy, a risk-level rating, and annotator rationale. We describe the dataset design, annotation process, and obfuscation taxonomy in detail. Our primary goal is practical: to give the research community a high-quality, culturally grounded resource for benchmarking LLM safety alignment. In doing so, we engage three broader tensions in the field: the blurring boundary between training and evaluation data, the need for domain coverage grounded in real-world risk, and the limits of scale as a substitute for cultural expertise.

2605.29664 2026-05-29 cs.DC cs.LG

AMDP: Asynchronous Multi-Directional Pipeline Parallelism for Large-Scale Models Training

AMDP:面向大规模模型训练的异步多方向流水线并行

Ling Chen, Houming Wu, Wenjie Yu

AI总结 针对异步流水线并行中参数不匹配导致收敛退化的问题,提出AMDP方法,通过限制流水线第一阶段处理小批量数量、启动多条并发流水线并自适应调整数量、以及跨小批量累积梯度后单次更新,在保持高利用率的同时加速训练并保证收敛。

详情
Comments
Accepted by ICML 2026, 9 pages, and 8 figures
AI中文摘要

流水线并行对于大规模模型训练至关重要,但现有的异步方法常因前向和反向传播之间的参数不匹配而损害收敛性。我们提出异步多方向流水线并行(AMDP)来缓解此问题,同时保持高利用率。AMDP限制每个流水线的第一阶段在反向传播前最多处理两个小批量,从而限制了前向和反向传播之间的参数更新次数。为减轻由此产生的流水线气泡,AMDP启动多条并发流水线,并根据流水线深度自适应调整其数量。此外,AMDP跨小批量累积梯度并在一次更新中应用,确保只有有限数量的小批量经历参数不匹配,且限制在一个优化步骤内。在GPT和BERT风格模型上的实验表明,AMDP在保持收敛的同时显著加速了训练。

英文摘要

Pipeline parallelism is essential for large-scale model training, but existing asynchronous approaches often degrade convergence due to parameter mismatch between forward and backward passes. We propose Asynchronous Multi-Directional Pipeline parallelism (AMDP) to mitigate this issue while sustaining high utilization. AMDP limits the first stage of each pipeline to process at most two minibatches before backpropagation, bounding the number of parameter updates between forward and backward passes. To alleviate the resulting pipeline bubbles, AMDP launches multiple concurrent pipelines and adapts their number according to pipeline depth. In addition, AMDP accumulates gradients across minibatches and applies them in a single update, ensuring that only a bounded number of minibatches experience parameter mismatch, limited to within one optimization step. Experiments on GPT- and BERT-style models demonstrate that AMDP significantly accelerates training while preserving convergence.

2605.29659 2026-05-29 cs.LG cs.AI cs.CL

Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content

Opir:针对毒性、越狱、仇恨言论和有害内容的高效多任务安全分类

Ihor Stepanov, Aleksandr Smechov

AI总结 本文提出基于GLiClass架构的Opir系列编码器护栏模型,通过多任务学习实现二进制安全/不安全分类、多标签毒性分类、越狱分类和零样本不安全提示与响应分类,在12项安全分类任务和17项类别任务上与现有护栏系统竞争,同时部署开销更小。

详情
Comments
23 pages, 4 figures, 9 tables
AI中文摘要

大型语言模型(LLM)应用的实时安全过滤需要能够检测不安全提示、有毒语言、越狱尝试和不安全响应的分类器,且不能像大型护栏模型那样成本高昂,同时要能区分良性的敏感文本与真正隐蔽的有害内容。在本文中,我们介绍了Opir,一个基于GLiClass架构的编码器护栏模型系列。Opir包括用于二进制安全/不安全分类、多标签毒性分类、越狱分类以及零样本不安全提示和响应分类的多任务模型。我们还发布了专门用于二进制安全/不安全分类的边缘变体,参数少于1亿。这些模型在一个三级分类体系上训练,该体系包含16个顶层标签、126个中层标签和854个叶标签,共996个类别。Opir的训练数据结合了基于分类体系的不安全提示、对抗性挖掘的难负例、良性安全保持示例、生成的响应示例、多语言翻译以及Aegis2和WildGuard训练子集的部分内容。我们还开源了一个评估工具,支持GLiClass和GLiNER2后端以及基于解码器的模型,涵盖二进制安全分类、多标签分类、毒性、越狱检测、提示安全、响应安全、响应拒绝以及跨公共基准系列的提示子类别视图。在与八个当代护栏系统(包括基于GLiNER2和生成式护栏模型)的扩展比较中,涵盖12项安全分类任务和17项类别任务,Opir变体在大多数基准数据集上与最强的开源基线模型竞争或领先,同时部署规模显著更小。

英文摘要

Real-time safety filtering for large language model (LLM) applications requires classifiers that can detect unsafe prompts, toxic language, jailbreak attempts, and unsafe responses without the cost profile of large guardrail models, and that can distinguish benign sensitive text from genuinely covert harmful content. In this paper, we introduce Opir, a family of encoder-based guardrail models built on the GLiClass architecture. Opir includes multi-task models for binary safe/unsafe classification, multi-label toxicity classification, jailbreak classification, and zero-shot unsafe prompt and response categorization. We also release edge variants with fewer than 100M parameters dedicated to binary safe/unsafe categorization. The models are trained on a three-level taxonomy containing 996 categories across 16 top-level labels, 126 mid-level labels, and 854 leaf labels. Opir's training data combines taxonomy-grounded unsafe prompts, adversarially mined hard negatives, benign safety-preserving examples, generated response examples, multilingual translations, and portions of the Aegis2 and WildGuard training subsets. We also open-sourced an evaluation harness that supports GLiClass and GLiNER2 backends as well as decoder-based models, and covers binary safety classification, multi-label categorization, toxicity, jailbreak detection, prompt safety, response safety, response refusal, and prompt subcategory views across public benchmark families. Across an expanded comparison spanning 12 safety-classification tasks and 17 category tasks against eight contemporary guardrail systems -- including both GLiNER2-based and generative guardrail models -- Opir variants are competitive on or ahead of the strongest open-weight baselines on the majority of benchmark datasets while operating with a substantially smaller deployment footprint.

2605.29657 2026-05-29 cs.CV cs.AI

OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning

OccamToken: 无需训练且预算自适应的令牌剪枝实现高效VLM推理

Geng Li, Guohao Chen, Ting Chen, Shilin Shan, Kuangji Zuo, Bofan Lyu, Tuo An, Gen Li, Jianfei Yang

AI总结 提出OccamToken框架,通过寄存器锚定的相对证据测试替代绝对排名范式,实现无需训练、自适应预算的视觉令牌剪枝,在保持高精度的同时大幅压缩令牌数量。

详情
Comments
26 pages,8 figures
AI中文摘要

视觉语言模型(VLM)依赖长视觉令牌序列进行视觉理解,导致预填充阶段在计算和内存上开销巨大。现有大多数剪枝方法遵循绝对排名范式,为视觉令牌分配重要性分数并保留固定的Top-K子集。本文认为这种范式本质上是脆弱的:注意力汇聚点扭曲令牌重要性排名,而图像冗余和查询依赖的视觉证据使得固定令牌预算在不同输入间不可靠。我们提出OccamToken,一个无需训练的框架,用寄存器锚定的相对证据测试替代绝对令牌排名。OccamToken不询问哪些令牌全局重要,而是评估视觉令牌是否提供了超越寄存器基线的信息。我们的关键洞察是,寄存器令牌自然吸收低信息注意力模式,使其成为识别真正信息性视觉证据的稳定参考。基于这一原理,OccamToken通过从寄存器注意力中导出的动态阈值,执行图像自适应冗余剪枝和查询自适应相关性剪枝。在LLaVA-NeXT、LLaVA-v1.5和Qwen3-VL上,OccamToken一致地改善了准确率-效率权衡,无需额外训练。值得注意的是,在LLaVA-NeXT上,它将2880个视觉令牌减少到约40个,同时保留了超过93%的原始准确率,即使在极端的1.4%保留率下也能实现稳定的视觉令牌压缩。

英文摘要

Vision-language models (VLMs) rely on long visual token sequences for visual understanding, making the prefill stage expensive in both computation and memory. Most existing pruning methods follow an absolute-ranking paradigm, assigning importance scores to visual tokens and retaining a fixed top-K subset. In this work, we argue that this paradigm is fundamentally brittle: attention sinks distort token importance rankings, while image redundancy and query-dependent visual evidence make fixed token budgets unreliable across inputs. We propose OccamToken, a training-free framework that replaces absolute token ranking with register-anchored relative evidence testing. Instead of asking which tokens are globally important, OccamToken evaluates whether a visual token provides information beyond a register-based reference. Our key insight is that register tokens naturally absorb low-information attention patterns, making them a stable reference for identifying genuinely informative visual evidence. Based on this principle, OccamToken performs both image-adaptive redundancy pruning and query-adaptive relevance pruning through dynamic thresholds derived from register attention. Across LLaVA-NeXT, LLaVA-v1.5, and Qwen3-VL, OccamToken consistently improves the accuracy-efficiency trade-off without additional training. Notably, on LLaVA-NeXT, it reduces 2,880 visual tokens to approximately 40 while preserving over 93% of the original accuracy, enabling stable visual token compression even in the extreme 1.4% retention regime.

2605.29656 2026-05-29 cs.AI

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

TRACE: 基于图尔敏论证元素的 LLM 思维链推理评估

Yundong Kim, Heyoung Yang

AI总结 提出 TRACE 指标,结合图尔敏论证理论与弗拉维尔元认知框架分析思维链推理结构,实验表明与基准准确率强相关(r=0.74)并可作为有效强化学习奖励信号。

详情
Comments
23 pages, Accepted at ICML 2026
AI中文摘要

由于缺乏真实答案,评估大型语言模型(LLM)的开放式输出仍然具有挑战性。现有指标依赖于最终答案的准确性或表面统计,而未检查推理过程本身。我们提出 TRACE(基于图尔敏论证元素的推理评估),一种分析思维链(CoT)推理过程的指标。TRACE 不判断结果,而是通过整合图尔敏的论证理论与弗拉维尔的元认知框架来检查论证的构建方式,从而评估推理结构。在 7 个推理模型的 26.3K QA 样本上的实验表明,TRACE 与基准准确率强相关(r=0.74)。此外,TRACE 作为强化学习奖励信号有效,优于仅基于准确率的基线。这些结果共同表明,逻辑合理的推理能带来更高质量的答案。因此,TRACE 可作为评估开放式输出的补充指标。代码可在 https://github.com/hyyangkisti/trace 获取。

英文摘要

Evaluating open-ended outputs from large language models (LLMs) remains challenging due to the absence of ground truth. Existing metrics rely on final-answer accuracy or surface-level statistics, leaving the reasoning process itself unexamined. We introduce TRACE (Toulmin-based Reasoning Assessment through Constructive Elements), a metric that analyzes Chain-of-Thought (CoT) reasoning processes. Rather than judging outcomes, TRACE inspects how arguments are constructed by integrating Toulmin's argumentation theory with Flavell's metacognitive framework to assess reasoning structure. Experiments on 26.3K QA samples across 7 reasoning models show strong correlation with benchmark accuracy (r=0.74). Furthermore, TRACE is effective as a reinforcement learning reward signal, outperforming accuracy-only baselines. Together, these results indicate that logically sound reasoning leads to higher-quality answers. TRACE thus serves as a complementary metric for evaluating open-ended outputs. Code is available at https://github.com/hyyangkisti/trace.

2605.29653 2026-05-29 cs.AI

PTCG-Bench: Can LLM Agents Master Pokémon Trading Card Game?

PTCG-Bench:LLM智能体能否掌握宝可梦集换式卡牌游戏?

Dongdong Hua, Yifei Sun, Renhong Huang, Feng Gao, Chunping Wang, Yang Yang

AI总结 提出PTCG-Bench基准,通过宝可梦集换式卡牌游戏评估LLM智能体的决策性能和自进化能力,并设计模块化消融实验分析智能体性能。

详情
AI中文摘要

面对一个策略复杂的棋盘游戏,人类玩家在玩几轮后就能快速学会制定策略。自主智能体在现实交互环境中需要类似的能力,然而现有的智能体基准往往未能充分捕捉这种策略性和不断演变的决策场景。我们提出了PTCG-Bench,一个基于宝可梦集换式卡牌游戏(PTCG)构建的基准,它在两个互补层面上评估LLM智能体:(1)它们在单个复杂环境中的决策性能,以及(2)它们通过积累经验自我进化的能力。我们进一步包括一个模块化消融实验,以更好地解释智能体性能,而不将其与模型能力混为一谈。我们的实验表明,尽管LLM智能体能够实现非平凡的 gameplay 性能,但持续稳定的自我进化仍然具有挑战性,并且性能对消融设计敏感。我们希望PTCG-Bench能够促进未来在现实交互环境中对消融感知和自我进化智能体的研究。

英文摘要

Given a strategically complex board game, human players can quickly learn to devise strategies after playing a few rounds. Autonomous agents require similar capabilities in realistic interactive environments, yet existing agent benchmarks often fail to fully capture such strategic and evolving decision-making scenarios. We present PTCG-Bench, a benchmark built on the Pok'{e}mon Trading Card Game (PTCG) that evaluates LLM agents at two complementary levels: (1) their decision-making performance within a single complex environment, and (2) their ability to self-evolving through accumulated experience. We further include a modular harness ablation to better interpret agent performance without conflating it with model capability. Our experiments show that, although LLM agents can achieve non-trivial gameplay performance, sustained and stable self-evolution remains challenging, and performance is sensitive to harness design. We hope that PTCG-Bench will facilitate future research on harness-aware and self-evolving agents in realistic interactive environments.

2605.29652 2026-05-29 cs.AI

Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation

快速思考,智能对话:结构化健康文本生成中确定性与神经计算的划分

Kai-Chen Cheng, Haejun Han, David Q. Sun

AI总结 提出一种将确定性计算与有限LLM调用相结合的流水线,用于结构化健康文本生成,在降低错误率和成本的同时保持忠实性。

详情
AI中文摘要

大型语言模型(LLM)越来越多地被用于从结构化记录(如可穿戴时间序列、生物标志物、生命体征和护理管理日志)生成健康文本。对于重复性健康输出,流畅性是不够的:系统必须忠实于源数据,将解释性主张建立在可用证据上,遵循既定政策,输出机器可读的内容,并且运行成本足够低以支持重复使用。我们探讨在结构化健康生成中,哪些责任应由确定性计算承担,而非运行时LLM提示。我们引入了“快速思考,智能对话”,一个睡眠健康洞察流水线,其中确定性代码在调用一次有界LLM写入器之前执行重复分析。在280个用户-夜晚和六个模型上,与结构化零样本和少样本单次调用基线相比,该方法实现了更低的数值误差、更低的指令合规误差和更低的端到端成本。层替换揭示了特定合约的失败:LLM比较增加了数值误差,LLM排名降低了策略选择,LLM属性增加了无根据的因果语言,而LLM生成的写入器接口即使在上游事实确定后也会重新引入误差。结果支持一个更广泛的设计规则:让代码负责重复分析,让LLM在有界接口内表达已验证的事实。

英文摘要

Large language models (LLMs) are increasingly being used to generate health text from structured records such as wearable time series, biomarkers, vitals, and care-management logs. For recurring health outputs, fluency is not enough: systems must remain faithful to source data, ground explanatory claims in available evidence, follow stated policies, emit machine-readable outputs, and run cheaply enough for repeated use. We ask which responsibilities in structured health generation should be deterministic computation rather than runtime LLM prompting. We introduce Think Fast, Talk Smart, a sleep-health insight pipeline in which deterministic code performs recurring analysis before one bounded LLM writer call. Across 280 user-nights and six models, achieves lower numeric error, lower instruction-compliance error, and lower end-to-end cost than structured zero-shot and few-shot one-call baselines. Layer replacement reveals contract-specific failures: LLM comparison raises numeric error, LLM ranking degrades policy selection, LLM attribution increases unsupported causal language, and an LLM-generated writer interface reintroduces errors even after upstream facts are deterministic. The results support a broader design rule: let code own recurring analysis, and let LLMs express verified facts within bounded interfaces.

2605.29649 2026-05-29 cs.AI

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning

LLM进化的符号AI规划领域无关启发式

Elliot Gestrin, Jendrik Seipp

AI总结 本文使用进化搜索让大语言模型生成领域无关的启发式函数,在未见测试域上超越手工最优启发式,并首次系统评估了启发式的信息性-速度权衡。

详情
AI中文摘要

启发式搜索是符号AI规划中的主导范式,最强的启发式是规划研究者数十年工作的成果。最近的工作表明,大型语言模型(LLM)可以为单个规划领域设计启发式,但迄今为止,没有LLM生成的启发式能在任意规划任务上工作。在本文中,我们使用进化搜索来产生第一个LLM生成的领域无关启发式,其超越了手工最优的现有技术。我们让LLM变异用C++编写的父启发式,将候选解存储在MAP-Elites档案中,以信息性和速度作为键,并通过混合覆盖率和求解时间计算适应度分数。为了将进化程序置于上下文中,我们还额外基准测试了一组广泛的手工启发式在信息性-速度权衡上的表现,据我们所知,这之前从未做过。在未见测试域上,我们最好的进化启发式比最强基线解决了更多任务,我们的完整启发式套件跨越了所述权衡的帕累托前沿。我们还发现,从平凡的盲目启发式开始进化优于从强FF启发式开始,即使最终程序本身是FF变体,并且LLM推理努力影响候选编译成功的频率远大于影响那些编译成功的候选的质量。由于进化程序是纯C++,它们可以作为即插即用替代品插入现有规划器,并继承底层搜索的健全性和完备性保证。

英文摘要

Heuristic search is the dominant paradigm in symbolic AI planning, and the strongest heuristics are the result of decades of work by planning researchers. Recent work has shown that large language models (LLMs) can design heuristics for individual planning domains, but no LLM-generated heuristic has so far worked on arbitrary planning tasks. In this paper, we use evolutionary search to produce the first LLM-generated domain-independent heuristics that exceed the hand-engineered state of the art. We let an LLM mutate parent heuristics written in C++, store candidates in a MAP-Elites archive keyed on informedness and speed and calculate fitness scores by blending coverage with solving time. To place the evolved programs in context, we additionally benchmark a broad set of hand-engineered heuristics on their informedness-speed tradeoff, which to our knowledge has not been done before. On unseen testing domains, our best evolved heuristic solves more tasks than even the strongest baseline, with our full heuristic suite spanning the Pareto frontier of said tradeoff. We also find that seeding evolution from the trivial blind heuristic outperforms seeding from the strong FF heuristic, even when the resulting program is itself an FF variant, and that LLM reasoning effort affects how often candidates compile much more than the quality of those that do. Because the evolved programs are plain C++, they slot into existing planners as drop-in replacements and inherit the soundness and completeness guarantees of the underlying search.

2605.29648 2026-05-29 cs.CL

Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering

超越数学与代码的可验证奖励:面向事实问答的轻量级语料库基础过程监督

Shicheng Fan, Haochang Hao, Dehai Min, Weihao Liu, Philip S. Yu, Lu Cheng

AI总结 提出CorVer,一种基于语料库共现统计的轻量级过程奖励方法,通过句子级信用分配和令牌级优势映射,在多个模型和基准上显著提升事实问答准确性且训练速度更快。

详情
AI中文摘要

将强化学习应用于提高知识密集型问答的事实准确性面临奖励设计困境。响应级奖励仅提供粗略监督,无法区分推理轨迹中的正确与错误陈述。句子级替代方案提供更细粒度的反馈,但通常依赖于NLI验证器、LLM评判或知识验证流水线,这些方法在RL规模下部署成本高昂,且对于稀有实体事实(准确奖励信号尤为重要)往往不可靠。我们提出CorVer(语料库验证),一种轻量级、即插即用的过程奖励,用源自维基百科共现统计的语料库基础信号替代神经验证器。CorVer分配句子级信用,并通过简单对齐将其映射到令牌级优势,仅需一个0.5B的提取器和每个句子一次语料库查找。在跨越六个指令微调模型(3B至14B)和五个QA基准的30个(模型,基准)单元中,CorVer在每个单元上均优于原始基线,TriviaQA平均提升4.1个百分点。在其可行配置下的20个单元中,CorVer在18个单元上优于四个神经验证器基线,同时训练速度快4.8至8.4倍。

英文摘要

Applying reinforcement learning to improve factual accuracy in knowledge-intensive question answering faces a reward design dilemma. Response-level rewards provide only coarse supervision and cannot distinguish correct from incorrect statements within a reasoning trace. Sentence-level alternatives offer finer-grained feedback, but typically rely on NLI verifiers, LLM judges, or knowledge-verification pipelines that are expensive to deploy at RL scale and often unreliable for rare-entity facts, where accurate reward signals are especially important. We propose CorVer (Corpus Verify), a lightweight, plug-in-ready process reward that replaces neural verifiers with a corpus-grounded signal derived from Wikipedia co-occurrence statistics. CorVer assigns sentence-level credit and maps it to token-level advantages via a simple alignment, requiring only a 0.5B extractor and a single corpus lookup per sentence. Across 30 (model, benchmark) cells spanning six instruction-tuned models (3B to 14B) and five QA benchmarks, CorVer improves over the raw baseline for every cell, with an average TriviaQA gain of +4.1 pp. It also outperforms four neural-verifier baselines in 18 of 20 cells under their feasible configurations, while training 4.8 to 8.4x faster.

2605.29647 2026-05-29 cs.CV

MARTIAN: A Rendering Framework for Aerial Mars Imagery from HiRISE Orbital Data

MARTIAN:基于HiRISE轨道数据的火星空中影像渲染框架

Dario Pisanti, Georgios Georgakis

AI总结 提出一个基于Blender的开源渲染框架MARTIAN,利用真实HiRISE轨道地图数据合成火星地形在不同光照和高度下的逼真空中视图,并生成精确姿态标注,以解决火星视觉导航训练数据稀缺问题。

详情
AI中文摘要

火星上的空中导航需要基于视觉的管道,这些管道必须对火星表面的多样光照条件和地形形态具有鲁棒性。训练和评估此类方法的一个关键瓶颈是缺乏大规模、带标注的空中数据集。我们提出了MARTIAN,一个基于Blender的开源渲染框架,它利用真实的HiRISE轨道地图产品,在可控光照条件和不同高度下合成火星地形的逼真空中视图。MARTIAN生成带有精确姿态标注的观测数据,直接解决了火星视觉导航训练数据稀缺的问题。该框架已通过其在基于地图的定位系统(用于Ingenuity和未来火星旋翼机)的并行工作中的部署得到验证,其中合成训练的深度图像匹配器已成功在真实火星图像上进行了评估。MARTIAN公开于:https://github.com/nasa-jpl/martian。

英文摘要

Aerial navigation on Mars requires vision-based pipelines that are robust to the diverse illumination conditions and terrain morphology of the Martian surface. A key bottleneck for training and evaluating such methods is the scarcity of large-scale, annotated aerial datasets. We present MARTIAN, an open-source Blender-based rendering framework that leverages real HiRISE orbital map products to synthesize realistic aerial views of the Martian terrain under controllable lighting conditions and at varying altitudes. MARTIAN generates observations with accurate pose annotations, directly addressing the scarcity of training data for vision-based navigation on Mars. The framework has been validated through its deployment in concurrent work on map-based localization systems for Ingenuity and future Mars rotorcraft, where synthetically trained deep image matchers were successfully evaluated on real Mars imagery. MARTIAN is publicly available at: https://github.com/nasa-jpl/martian.

2605.29645 2026-05-29 cs.LG cs.AI stat.ML

The Sample Complexity of Multiclass and Sparse Contextual Bandits

多类别和稀疏上下文赌博机的样本复杂度

Liad Erez, Fan Chen, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran, Alexander Rakhlin

AI总结 针对随机i.i.d.上下文赌博机,提出基于决策估计系数和低方差探索的算法,在稀疏奖励下实现接近最优的样本复杂度,并匹配下界。

详情
AI中文摘要

我们研究随机i.i.d.设置下的上下文赌博机,其中学习器观察来自未知分布的上下文,从有限集合$A$中选择动作,并旨在基于赌博机反馈从给定类别中识别近似最优策略。受零一奖励的赌博机多类别分类启发,我们关注\emph{$s$-稀疏}设置,其中对于每个上下文,奖励向量的$L_1$范数至多为$s \ll |A|$。我们的主要结果是设计算法,以高概率输出一个相对于策略类$Π$的$ε$-最优策略,使用$ ilde{O} ((s/ε^2 + |A|/ε)\log |Π|/δ)$个样本。我们将此界推广到一般Natarajan类,并补充了匹配的下界(对数因子内),从而缩小了先前工作(Erez等人,2024, 2025)留下的巨大差距,后者额外增加了$Θ(|A|^9)$依赖。我们通过两种互补方法获得这些结果。首先,我们从具有结构化观测的上下文决策角度分析上下文赌博机,设计了一种探索-优化算法,其样本复杂度由\emph{决策估计系数}(DEC;Foster等人,2021, 2022)控制。我们证明,在$s$-稀疏奖励下,诱导的模型类具有随$s$缩放的尖锐DEC界,直接产生最优速率。由于这种方法主要是信息论性的,并涉及求解复杂的min-max优化问题,我们还开发了第二种更专门的算法方法,基于低方差探索技术。这种方法产生了具体、易处理的算法,并自然地扩展到上下文组合半赌博机,为赌博机多类别列表分类提供了改进的样本复杂度保证。

英文摘要

We study contextual bandits in the stochastic i.i.d.\ setting, where a learner observes contexts drawn from an unknown distribution, selects actions from a finite set $A$, and aims to identify an approximately optimal policy from a given class based on bandit feedback. Motivated by bandit multiclass classification with zero-one rewards, we focus on the \emph{$s$-sparse} setting in which, for every context, the reward vector has $L_1$-norm at most $s \ll |A|$. Our main result is the design of algorithms that, with high probability, output an $ε$-optimal policy compared to policy class $Π$ using $\tilde{O} ((s/ε^2 + |A|/ε)\log |Π|/δ)$ samples. We extend this bound to general Natarajan classes and complement it with a matching lower bound (up to logarithmic factors), thereby closing a substantial gap left by prior work (Erez et al., 2024, 2025), which incurred an additional $Θ(|A|^9)$ dependence. We obtain these results via two complementary approaches. First, we analyze contextual bandits through the lens of contextual decision making with structured observations, designing an exploration-by-optimization algorithm whose sample complexity is governed by the \emph{decision-estimation coefficient} (DEC; Foster et al., 2021, 2022). We show that, with $s$-sparse rewards, the induced model class admits a sharp DEC bound that scales with $s$ and directly yields the optimal rate. Since this approach is largely information-theoretic and involves solving complex min-max optimization problems, we also develop a second, more specialized algorithmic method based on a low-variance exploration technique. This approach leads to concrete, tractable algorithms and naturally extends to contextual combinatorial semi-bandits, leading to improved sample complexity guarantees for bandit multiclass list classification.

2605.29643 2026-05-29 cs.CV cs.MA

AgentCVR: Active Multi-Agent Cross-Video Reasoning via Script-Simulated Reinforcement Learning

AgentCVR:通过脚本模拟强化学习的主动多智能体跨视频推理

Yilun Qiu, Jiahe Wang, Cilin Yan, Jiayin Cai, Xiaolong Jiang, Yao Hu, Chun Yuan

AI总结 提出AgentCVR多智能体框架,将跨视频推理视为主动证据获取任务,通过主智能体协调视觉和音频智能体进行定向证据提取,并引入脚本模拟强化学习优化策略,在跨视频对齐和定位任务上超越单次基线,达到与闭源系统相当的性能。

详情
AI中文摘要

跨视频推理(CVR)已成为多模态智能的关键前沿,要求模型检索、对齐和聚合分布在多个视频中的证据。当前的多模态大语言模型(MLLMs)往往难以应对CVR,因为简单的单次策略将多个视频编码到共享压缩上下文中,可能掩盖罕见但关键的证据。在本文中,我们提出AgentCVR,一个多智能体框架,将CVR视为主动证据获取任务。AgentCVR使用主智能体迭代协调专门的视觉和音频智能体进行定向证据提取。为确保高效训练,我们引入脚本模拟强化学习,利用LLM生成的语义脚本和轻量级文本模拟器优化智能体策略,在在线探索期间避免昂贵的多模态推理。在综合CVR基准上的实验结果表明,AgentCVR优于单次基线,并在复杂跨视频对齐和定位任务上达到与最先进闭源系统相当的性能。为确保可复现性,我们的代码可在https://github.com/wang-jh24/AgentCVR获取。

英文摘要

Cross-Video Reasoning (CVR) has emerged as a critical frontier in multimodal intelligence, requiring models to retrieve, align, and aggregate evidence distributed across multiple videos. Current Multimodal Large Language Models (MLLMs) often struggle with CVR, as simple single-pass strategies encode multiple videos into a shared compressed context, potentially obscuring rare but critical evidence. In this paper, we propose AgentCVR, a multi-agent framework that treats CVR as an active evidence-acquisition task. AgentCVR employs a Master Agent to iteratively coordinate specialized Visual and Audio Agents for targeted evidence extraction. To ensure efficient training, we introduce Script-Simulated RL, which optimizes the agent's policy with LLM-generated semantic scripts and a lightweight text-based simulator, bypassing costly multimodal inference during online exploration. Experimental results on a comprehensive CVR benchmark show that AgentCVR outperforms single-pass baselines and achieves comparable performance to state-of-the-art closed-source systems, particularly in complex cross-video alignment and localization. To ensure reproducibility, our code is available at https://github.com/wang-jh24/AgentCVR.

2605.29642 2026-05-29 stat.ML cs.IT cs.LG math.IT

Matching Rates and Optimal Allocation for Federated Probe-Logit Distillation under Heterogeneous Bandwidth Budgets

异构带宽预算下的联邦探针-逻辑蒸馏匹配率与最优分配

Prasanjit Dubey, Xiaoming Huo

AI总结 针对联邦探针-逻辑蒸馏(FPLD)中带宽项速率紧性及异构节点带宽分配问题,提出匹配下界、多轮改进方案及闭合形式最优分配规则。

详情
AI中文摘要

在联邦语言建模中,$K$个节点各自持有$n$个样本,但无法合并数据或交换全精度梯度或权重。我们研究当每个节点在公共探针集上每次查询最多上传$B$比特时,对$V$个令牌上的条件分布进行估计的极小极大速率。在联邦探针-逻辑蒸馏(FPLD)中,每个节点在探针集上传输一个标量量化的逻辑向量,聚合器蒸馏出一个全局参数化学生模型。先前的工作(Dubey and Huo, 2026)建立了高概率KL速率$O(d/(Kn) + ρ\sqrt{V \log V / m} + K^{-1} \cdot 2^{-2B/V})$加上优化松弛项,其中带宽项采用迹锐化形式。该带宽项速率是否紧致,以及上界如何推广到异构每节点带宽,仍是开放问题。 我们填补了这两个空白。首先,抖动FPLD构造在非退化条件下具有匹配的单轮下界$Ω(K^{-1} \cdot 2^{-2B/V})$,将带宽轴速率确定为$Θ(K^{-1} \cdot 2^{-2B/V})$。使用嵌套/缩放残差量化器的$T$轮顺序细化达到$O(K^{-1} \cdot 2^{-2TB/V})$;对于任意$T > 1$,原始FPLD的与$T$无关的带宽项是次优的。其次,我们建立了每节点预算$B_i$的异构带宽上界,并配以闭合形式的最优分配$B_i^* = B_{\mathrm{tot}}/K + (V/2) \log_2(w_i / ar{w}_g)$,这是一种对数倾斜的注水规则,是失真率优化中反向注水的每节点类比。一种即插即用自适应变体通过短预热阶段估计权重,并达到$1 + O(\sqrt{\log(K/δ)/(m T_0)})$的相对次优性。合成n-gram模拟证实经验KL被上界和下界所界定,并且在异构裁剪下最优分配严格优于均匀和逆权重基线。

英文摘要

In federated language modeling, $K$ nodes each hold $n$ samples but cannot pool data or exchange full-precision gradients or weights. We study the minimax rate at which a conditional distribution over $V$ tokens can be estimated when each node may upload at most $B$ bits per query in a public probe set. In federated probe-logit distillation (FPLD), each node transmits a scalar-quantized logit vector on the probe set, and an aggregator distills a global parametric student. Prior work (Dubey and Huo, 2026) establishes a high-probability KL rate $O(d/(Kn) + ρ\sqrt{V \log V / m} + K^{-1} \cdot 2^{-2B/V})$ plus optimization slack, with the bandwidth term in its trace-sharpened form. Whether this bandwidth-term rate is tight, and how the upper bound generalizes to heterogeneous per-node bandwidths, are left open. We close both gaps. First, the dithered FPLD construction has a matching single-round lower bound $Ω(K^{-1} \cdot 2^{-2B/V})$ under non-degeneracy, pinning the bandwidth-axis rate at $Θ(K^{-1} \cdot 2^{-2B/V})$. $T$-round sequential refinement with nested/scaled residual quantizers achieves $O(K^{-1} \cdot 2^{-2TB/V})$; vanilla FPLD's $T$-independent bandwidth term is suboptimal for every $T > 1$. Second, we establish a heterogeneous-bandwidth upper bound for per-node budgets $B_i$, paired with a closed-form optimal allocation $B_i^* = B_{\mathrm{tot}}/K + (V/2) \log_2(w_i / \bar{w}_g)$, a log-tilted water-filling rule that is the per-node analogue of reverse water-filling for distortion-rate optimization. A plug-in adaptive variant estimates the weights from a short warm-up phase and attains $1 + O(\sqrt{\log(K/δ)/(m T_0)})$ relative suboptimality. Synthetic n-gram simulations confirm that empirical KL is bracketed by the upper and lower bounds and that the optimal allocation strictly dominates uniform and inverse-weighted baselines under heterogeneous clipping.

2605.29638 2026-05-29 cs.CL

Classification of non-analyzable word types in web documents to implement an effective Korean e-learning system

网络文档中不可分析词类型的分类以实现有效的韩语电子学习系统

Sang-Taek Park, Ae-Lim Ahn, Eric Laporte, Jee-Sun Nam

AI总结 通过构建正式与非正式语料库,比较其表达差异,并提出局部语法图(LGG)模型以有效处理非正式文本,用于韩语电子学习系统。

详情
Journal ref
Doing Research in Applied Linguistics, 2011, pp. 61-68
AI中文摘要

电子学习系统应传递反映语言实际使用中各种现象的内容。除了正式韩语,包含网络文档、手机短信或推特帖子等真实世界韩语表达的电子学习系统将对高级学习者有用。我们构建了两种语料库:一种由在线新闻文章等正式文档组成;另一种由网络博客中关于新产品的客户评论等非正式文档组成。通过比较这些语料库,我们展示了这两种语料库中表达的差异。我们调查了非正式语料库的主要特征。鉴于文本中有很大比例是非正式的,我们提出局部语法图(LGG)作为在韩语电子学习系统中有效处理它们的合适模型。

英文摘要

E-learning systems should deliver contents that reflect various phenomena of the language as it is used. In addition to formal Korean, e-learning systems that would include real-world Korean expressions such as those in web documents, mobile text messages, or twitter posts, would be useful to high-level learners. We construct two types of corpora: one is made of formal documents like online news articles; the other is made of informal documents like customer reviews about new products in web blogs. By comparing these corpora, we show how expressions differ in these two types of corpora. We survey the main characteristics of the informal corpus. Given that a significant proportion of text is informal, we propose Local Grammar Graphs (LGG) as an appropriate model to treat them effectively in Korean e-learning systems.

2605.29637 2026-05-29 cs.CL

Evaluating Cross-lingual Knowledge Consistency in Code-Mixed vis-a-vis Indian Languages using IndicKLAR

评估混合语码与印度语言中的跨语言知识一致性:基于IndiKLAR

Debajyoti Mazumder, Divyansh Pathak, Prashant Kodali, Aditya Joshi, Akshay Agarwal, Jasabanta Patro

AI总结 本文通过构建IndiKLAR基准,评估大语言模型在英语、混合语码和印度本土语言上的知识召回一致性,发现混合语码输入能显著缩小与英语的性能差距,并识别出从本土语言到混合语码的“翻转点”。

详情
Comments
23 pages
AI中文摘要

大型语言模型能可靠地回忆英语知识,但在低资源语言上对相同查询却常常失败——这种跨语言一致性差距在印度语言及其混合语码变体中尚未得到充分研究。为了研究这一差距,我们引入了IndiKLAR,这是KLAR-CLC基准的印度扩展,覆盖了22种印度官方语言中的18种,并为11种广泛使用的语言对配对了混合语码变体,且对这11种设置的单语和混合语码变体进行了母语验证。这种三方对齐提供了一个独特的机会来考察知识召回一致性如何随英语、混合语码和印度本土语言输入的变化而变化。在九个开放权重模型上的评估发现,本土语言与英语的准确率差距可达约0.50,而混合语码输入能缩小大部分差距——无需任何模型层面的干预即可使性能接近英语(差距约0.05)。受此启发,我们评估了几种在语言转换暴露方式上有所不同的提示策略,包括两阶段的翻译-回答设置、单阶段的联合翻译-回答提示,以及“翻译中思考”(TinT)——一种单步策略,模型内部转换输入并仅输出最终答案。在从本土语言到混合语码再到英语的性能轨迹中,我们识别出一个一致的翻转点——即错误与正确预测之间的边界——位于本土语言和混合语码设置之间。有趣的是,无论该轨迹是由输入表面形式还是由模型的内部转换过程诱导,这一现象都成立。

英文摘要

Large language models recall knowledge reliably in English but often fail on the same query posed in a lower-resourced language -- a crosslingual consistency gap that remains underexplored for Indian languages and their code-mixed counterparts. To study this gap, we introduce IndiKLAR, an Indic extension of the KLAR-CLC benchmark covering 18 of the 22 scheduled Indian languages and pairing them with code-mixed variants for 11 widely used language pairs, with native-speaker verification of both monolingual and code-mixed variants for these 11 settings. This three-way alignment offers a unique opportunity to examine how knowledge recall consistency varies across the spectrum of English, code-mixed, and native Indian language inputs. Evaluating across nine open-weight models, we find that the native-language accuracy gap to English can reach $\sim$0.50, while code-mixed inputs close most of it -- bringing performance within $\sim$0.05 of English without any model-level intervention. Motivated by this, we evaluate several prompting strategies that vary in how language conversion is exposed, including a two-stage translate-then-answer setup, a one-stage joint translation-and-answer prompt, and Translate-in-Thought (TinT) -- a single-step strategy in which the model converts the input internally and emits only the final answer. Across the performance trajectory native $\rightarrow$ code-mixed $\rightarrow$ English, we identify a consistent flip point -- the boundary between incorrect and correct prediction -- that lies between the native and code-mixed settings. Interestingly, this holds whether the trajectory is induced by the input surface form or by the model's internal conversion process.

2605.29635 2026-05-29 math.OC cs.LG

MoSSP: A Momentum-Based Single-Loop Stochastic Penalty Method for Nonconvex Constrained DC-Regularized Optimization

MoSSP: 基于动量的单环随机惩罚方法用于非凸约束DC正则化优化

Luxuan Li, Chunfeng Cui, Xiao Wang

AI总结 提出MoSSP算法,一种基于动量的单环随机惩罚方法,用于解决具有非凸约束和DC正则化的随机优化问题,实现了O(ε^{-4})和O(ε^{-3})的oracle复杂度。

详情
Comments
35 pages, 3 figures
AI中文摘要

本文研究了一类具有差凸(DC)正则化的非凸约束随机问题,其中可行集可能是非凸的,且DC正则化子的凹部分允许非光滑。基本挑战在于在保持非凸约束可行性的同时实现良好的oracle复杂度。尽管单环算法能有效解决无约束DC优化问题,但它们在具有DC结构的约束优化中的潜力尚未被充分探索。为填补这一空白,我们开发了MoSSP,一种基于动量的单环随机惩罚方法,用于此类问题,并具有可证明的复杂度保证。关键思想是将单个随机近端梯度步骤应用于惩罚的Moreau包络加上凸DC部分,同时并行计算凹部分的近端映射。我们推导了两种算法变体:一种具有O(ε^{-4}) oracle复杂度的Polyak动量版本,用于寻找随机ε-KKT点,以及一种改进的O(ε^{-3})版本,结合了递归动量。实验结果证明了所提算法的有效性。

英文摘要

In this paper, we study a structured class of nonconvex constrained stochastic problems with difference-of-convex (DC) regularization, where the feasible set is possibly nonconvex and the concave part of the DC regularizer is allowed to be nonsmooth. The fundamental challenge lies in maintaining feasibility for nonconvex constraints while achieving favorable oracle complexity. Although single-loop algorithms efficiently solve unconstrained DC optimization problems, their potential for constrained optimization with DC structure remains largely unexplored. To address this gap, we develop MoSSP, a Momentum-based Single-loop Stochastic Penalty method for such problems with provable complexity guarantees. The key idea is to apply a single stochastic proximal-gradient step to the Moreau envelope of the penalty plus the convex DC part, with the concave part's proximal mapping computed in parallel. We derive two algorithm variants: a Polyak-momentum version with $O(\varepsilon^{-4})$ oracle complexity for finding stochastic $\varepsilon$-KKT points, and an improved $O(\varepsilon^{-3})$ version incorporating recursive momentum. Experimental results demonstrate the effectiveness of the proposed algorithms.

2605.29634 2026-05-29 cs.LG

Relational Rank Geometry in Transformers: Detecting and Steering Hidden-State Relation Frames

Transformer中的关系秩几何:检测与引导隐藏状态关系框架

Mazen Kobrosly

AI总结 本文通过Plücker符号熵检测Transformer隐藏状态中元组关系的秩索引几何,并在Llama系列模型上验证了关系探测的可控干预,实现了从关系探测到关系框架干预的受控桥梁。

详情
Comments
32 pages, 9 figures
AI中文摘要

Transformer隐藏状态通常通过局部或低阶对象解释:神经元、稀疏特征、注意力头、残差流方向或激活补丁。本文研究一个互补对象:元组间关系的秩索引几何。我使用Plücker符号熵来测试r元关系是否在隐藏状态空间中留下arity匹配的方向签名。在Llama系列8B、70B和405B检查点上,真实关系元组在预期秩k=r(r=3,...,6)处显示出比随机控制审计中打乱元组更强的方向签名一致性。多模板审计表明,这些效应在表面变化下仍然存在,所有测试的405B行保持正预期秩边际,8B/70B保持正行,但带有构造器特定的混合单元。然后我问相同的关系几何是否可以被引导。在一个边缘网格干净/损坏干预实验中,使用32个提示,行/列框架和答案格式保持不变,而YES/NO关系图发生变化,损坏的隐藏状态关系框架被修补为干净或安慰剂目标。在70B和405B中,干净目标的关系框架路径恢复了干净答案行为和残差关系几何,而仅质心和等范数控制显示出可忽略的恢复。位置/顺序控制进一步将标记点重要性从有序干净框架几何中分离:目标干净形状和跨提示干净形状在标记接口处恢复行为和残差几何,而损坏供体转移、同位置置换/反射、错误位置干净增量、仅质心运动和等范数噪声失败或远低于干净框架路径。结果是从关系探测到关系框架干预的受控桥梁:关系秩几何可以在Transformer隐藏状态中被检测、定位和行为验证。

英文摘要

Transformer hidden states are often interpreted through local or low-order objects: neurons, sparse features, attention heads, residual-stream directions, or activation patches. This paper studies a complementary object: the rank-indexed geometry of relations among token tuples. I use Plucker sign entropy to test whether r-argument relations leave arity-matched orientation signatures in hidden-state space. Across Llama-family 8B, 70B, and 405B checkpoints, true relation tuples show stronger orientation-sign consistency at the expected rank k=r for r=3,...,6 than scrambled tuples under matched random-control audits. Multi-template audits show that the effects survive surface variation, with all tested 405B rows retaining positive expected-rank margins and 8B/70B retaining positive rows with constructor-specific mixed cells. I then ask whether the same relation geometry can be steered. In an edge-grid clean/corrupt intervention assay over 32 prompts, the row/column scaffold and answer format stay fixed while the YES/NO relation map changes, and the corrupt hidden-state relation frame is patched toward clean or placebo targets. In 70B and 405B, clean-targeted relation-frame paths recover clean-answer behavior and residual relation geometry, while centroid-only and equal-norm controls show negligible recovery. Site/order controls further separate marker-site importance from ordered clean-frame geometry: target clean shape and cross-prompt clean shape recover behavior and residual geometry at the marker interface, whereas corrupt-donor transfer, same-site permutation/reflection, wrong-site clean deltas, centroid-only motion, and equal-norm noise fail or remain far below clean-frame paths. The result is a controlled bridge from relation probing to relation-frame intervention: relation rank geometry can be detected, targeted, and behaviorally validated in transformer hidden states.

2605.29631 2026-05-29 cs.CL cs.AI

Predicting Causal Effects from Natural Language Queries using Structured Representations

使用结构化表示从自然语言查询预测因果效应

Giuliano Martinelli, Piriyakorn Piriyatamwong, Abelardo Carlos Martinez Lorenzo, Jasmin Baier, Riccardo Orlando, Satvik Garg, Sharif Kazemi, Linxi Wang, Arianna Legovini, Samuel Fraiberger

AI总结 针对从自然语言查询预测因果效应的问题,提出Query2Effect基准和两步框架,通过生成结构化表示再预测效应大小,微调使绝对误差降低27%-71%。

详情
Comments
18 pages
AI中文摘要

随机对照试验是医学和社会科学的基石,因为它们能够可靠地估计因果效应。然而,进行这些试验成本高昂且耗时,这激发了从现有实验证据预测因果效应的兴趣。大型语言模型(LLMs)的最新进展在知识密集型任务上表现出强大的性能,引发了一个问题:这些模型能否用于预测因果效应大小?为了研究这一点,我们引入了Query2Effect,这是一个新的大规模基准,包含超过72,000个与实验描述对齐的自然语言问题,通过改变查询在隐含性、抽象性和歧义性维度上的特异性,模拟现实的信息寻求场景。然后,我们提出了一个两步框架,首先生成查询的合成结构化表示,然后使用监督编码器模型预测效应大小。实验表明,微调在提高预测性能方面起着关键作用,与开箱即用的提示式LLMs相比,绝对误差降低了-27%到-71%,并且我们的两步框架有利于域外泛化,突显了将语义解释与数值效应估计分离的好处。

英文摘要

Randomized controlled trials are a cornerstone of medicine and the social sciences as they enable reliable estimates of causal effects. However, they are costly and time-consuming to conduct, motivating interest in predicting causal effects from existing experimental evidence. Recent advances in large language models (LLMs) have demonstrated strong performance on knowledge-intensive tasks, raising the question of whether these models can be used for forecasting causal effect sizes. To investigate this, we introduce Query2Effect, a new large-scale benchmark consisting of more than 72,000 natural language questions aligned with experiment descriptions, created to simulate realistic information-seeking scenarios by varying query specificity along dimensions of implicitness, abstraction, and ambiguity. We then propose a two-step framework that first generates a synthetic structured representation of a query before predicting effect size using a supervised encoder model. Experiments show that finetuning plays a crucial role in improving prediction performance, with absolute error reducing by -27% up to -71% compared to prompted out-of-the-box LLMs, and that our two-step framework is beneficial for out-of-domain generalization, highlighting the benefits of separating semantic interpretation from numerical effect estimation.

2605.29630 2026-05-29 cs.CL cs.AI cs.IR

Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory

实体碰撞:一种用于归因智能体记忆检索提升的分层协议

Youwang Deng

AI总结 提出实体碰撞协议,通过控制实体重叠和标签分层,将BM25基线固定,从而将检索提升归因于嵌入器,并在多维度实验中揭示编码器容量并非唯一约束。

详情
Comments
48 pages with appendix; 6-page body, mandatory Limitations, References, and 7 appendices. Code, benchmarks, and 37 reproduce scripts: https://github.com/youwangd/engram (see paper/REPRODUCIBILITY.md). Apache 2.0
AI中文摘要

端到端的智能体记忆基准测试为每个检索器报告一个单一的hit@k指标,混淆了词汇泄漏(不受控制的查询/黄金/干扰实体重叠)与标签混合(偏好、服务、工具平均在一起)。我们提出实体碰撞,一种系统无关的协议,通过构造将BM25基线固定——每个干扰项共享答案的实体标记——并按判别器标签对查询进行分层,因此任何超过BM25的提升都可归因于嵌入器。应用于一个开源智能体记忆测试平台,涵盖5个标签×3个嵌入器×5个碰撞程度,并采用配对自助法95%置信区间,该协议揭示了一个双轴模式:256维哈希三元组仅在深度碰撞下的封闭词汇标签上有帮助;MiniLM-384在两个轴上均占优;而参数规模2.7倍的BGE-large并未在MiniLM上一致提升——它在意图式查询上胜出,但在词汇式查询上落败。编码器容量本身并非约束条件。合成意图标签的零假设在LongMemEval(n=500)上重现为单会话偏好回忆悬崖。LoCoMo上的自适应向量权重路由是一个测量的零假设:存在11.7个百分点的oracle空间,但我们测试的所有信号均未恢复。所有26个结果表和37个复现脚本均受版本控制并由公共注册表验证;该协议在一个确定性管理的记忆测试平台(事件溯源决策日志、DAG状态机模式生命周期)上执行,因此每个报告的置信区间都可以从输入流中逐字节复现。

英文摘要

End-to-end agent-memory benchmarks report a single hit@k per retriever, confounding lexical leakage (uncontrolled query/gold/distractor entity overlap) with tag-mixing (preferences, services, tools averaged together). We propose entity-collision, a system-agnostic protocol that pins the BM25 floor by construction -- every distractor shares the answer's entity tokens -- and stratifies queries by discriminator tag, so any lift over BM25 is attributable to the embedder. Applied to an open-source agent-memory testbed across 5 tags x 3 embedders x 5 collision degrees with paired-bootstrap 95% CIs, the protocol reveals a two-axis pattern: a 256-d hash trigram helps only on closed-vocabulary lexical tags at deep collision; MiniLM-384 dominates both axes; and a 2.7x-parameter BGE-large does not uniformly improve on MiniLM -- it wins on intent-style queries but loses on lexical ones. Encoder capacity alone is not the binding constraint. The synthetic intent-tag null replicates on LongMemEval (n=500) as a single-session-preference recall cliff. Adaptive vector-weight routing on LoCoMo is a measured null: 11.7pp of oracle headroom exists, but no signal we tested recovers it. All 26 result tables and 37 reproduce scripts are version-controlled and verified by a public registry; the protocol is exercised on a deterministically governed memory testbed (event-sourced decision log, DAG-state-machine schema lifecycle) so every reported CI is reproducible byte-for-byte from the ingest stream.

2605.29629 2026-05-29 cs.AI

Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures

超越攻击成功率:LLM安全失效的时间对数可观测性

Junyoung Park, Sunghwan Park, Seongyong Ju, Jaewoo Lee

AI总结 提出时间对数可观测性(TLO)方法,通过解码过程中的合规-拒绝边际将模型-攻击条件映射到校准的二维平面,揭示攻击成功的时间模式,并基于此设计早期停止规则将成功越狱减少一半以上。

详情
AI中文摘要

攻击成功率(ASR)在生成结束时用单个是/否标签评估每次越狱,告诉我们是否发生了失败,但未说明失败如何展开。产生同等有害输出的两次攻击可能遵循完全不同的路径,而ASR无法区分它们。我们仅从对数几率使这些隐藏路径变得可观测。时间对数可观测性(TLO)是一种无需训练的诊断方法,在解码过程中观察合规-拒绝边际,并将每个模型-攻击条件置于校准的二维平面上。通过设计,该平面在ASR信息量最小的情况下最具信息量:即在因真正不同原因而成功的攻击中。在四种对齐的LLM和三种越狱范式下,具有几乎相同ASR的攻击在平面上位于明显不同的点:同一模型可能通过不同的时间模式失败。在大多数条件下,几何形状与来自隐藏状态的拒绝方向探针匹配,但一个模型显示了固定词汇方法的局限性。从TLO导出的简单早期停止规则将成功的越狱减少一半以上,且对普通良性查询无误报。安全评估应报告失败发生的时间和方式,而不仅仅是是否发生。TLO仅从对数几率即可观测前两者。

英文摘要

Attack Success Rate (ASR) evaluates each jailbreak with a single yes/no label at the end of generation, telling us whether a failure happened but not how it unfolded. Two attacks that produce equally harmful outputs may have followed completely different paths, and ASR cannot tell them apart. We make those hidden paths observable from logits alone. Temporal Logit Observability (TLO) is a training-free diagnostic that watches a compliance-refusal margin during decoding and places each model-attack condition on a calibrated 2D plane. By design, this plane is most informative exactly where ASR is least informative: among attacks that succeed for genuinely different reasons. Across four aligned LLMs and three jailbreak paradigms, attacks with nearly identical ASR land at clearly different points on the plane: the same model can fail through different temporal patterns. The geometry matches refusal-direction probes from hidden states on most conditions, with one model showing the limit of our fixed-lexicon approach. A simple early-stop rule derived from TLO cuts successful jailbreaks by more than half, without false alarms on plain benign queries. Safety evaluation should report when and how a failure unfolds, not only whether it occurred. TLO makes the first two observable from logits alone.

2605.29628 2026-05-29 cs.SD cs.AI cs.CL cs.LG eess.AS

COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings

COMET:音频-文本多模态对比嵌入中模态间隙的概念空间剖析

Yonggang Zhu, Liting Gao, Aidong Men, Wenwu Wang

AI总结 提出COMET框架,通过PLS-SVD分解揭示CLAP模型中模态间隙主要由少数共享概念轴贡献,并基于谱截断方法无训练地缓解间隙,实现零样本音频字幕接近全监督性能。

详情
AI中文摘要

对比语言-音频预训练(CLAP)模型广泛用于音频理解,并在许多零样本应用中支持模态无关的条件交换。然而,其性能受到音频和文本嵌入之间模态间隙的严重影响。现有解释主要将此间隙归因于锥体效应,将其视为均值嵌入之间的偏移,但仅纠正均值只能带来有限的改进。其他假设,如信息不平衡和维度坍缩,也被提出,但仍未得到充分验证,并且在音频领域尚未被深入研究。同时,一些工作尝试将多模态对比嵌入分解为可解释的概念,但没有任何工作从概念分解的角度显式分析模态间隙。在这项工作中,我们引入了COMET(基于PLS-SVD变换的概念空间组织与模态间隙解释),这是一个新颖的用于CLAP的偏最小二乘奇异值分解(PLS-SVD)框架,揭示了模态间隙的更广泛视角。我们的框架揭示,只有一小部分可解释的轴(捕捉共享概念)对相似度计算有显著贡献,并且均值分量仅部分代表模态间隙。基于这一见解,我们提出了一种简单的谱截断方法,以无训练的方式缓解模态间隙。该方法使得零样本音频字幕通过条件交换接近全监督性能,无需大型辅助记忆库或昂贵计算。同时,它在保持检索和音频字幕任务强性能的同时,实现了显著的嵌入维度缩减。

英文摘要

Contrastive Language-Audio Pretraining (CLAP) models are widely used for audio understanding and support modality-agnostic condition swapping in many zero-shot applications. However, their performance is heavily affected by the modality gap between audio and text embeddings. Existing explanations mainly attribute this gap to the cone effect, treating it as a shift between mean embeddings, yet correcting the mean alone yields only limited improvements. Alternative hypotheses, such as information imbalance and dimensionality collapse, have also been proposed, but they remain insufficiently verified and have not been thoroughly studied in the audio domain. Meanwhile, several works attempt to decompose multimodal contrastive embeddings into interpretable concepts, but none explicitly analyze the modality gap from the perspective of concept decomposition. In this work, we introduce COMET (Concept space Organization and Modality gap Explanation with PLS-SVD Transformation), a novel partial least squares singular value decomposition (PLS-SVD) framework for CLAP that unveils a broader perspective of the modality gap. Our framework reveals that only a small, interpretable subset of axes, which captures shared concepts, contributes substantially to similarity computation, and that the mean component represents only partially the modality gap. Building on this insight, we propose a simple spectral truncation method that mitigates the modality gap in a training-free manner. The method enables zero-shot audio captioning with condition swapping to approach fully supervised performance, without requiring large auxiliary memory banks or expensive computation. At the same time, it achieves substantial embedding dimensionality reduction while preserving strong performance on retrieval and audio captioning tasks.

2605.29626 2026-05-29 cs.CL cs.AI

DLM-SWAI: Steering Diffusion Language Models Before They Unmask

DLM-SWAI: 在扩散语言模型去掩码之前引导它们

Hyeseon An, Yo-Sub Han

AI总结 提出一种无需训练的引导方法DLM-SWAI,通过预计算的词级风格分数在去噪步骤中偏置词分布,实现扩散语言模型的可控生成。

详情
Comments
preprint
AI中文摘要

将语言模型生成引导至期望的文本属性对于实际部署至关重要,而推理时方法特别有吸引力,因为它们无需重新训练即可实现可控生成。最近的研究也强调了扩散语言模型作为一种新兴的生成范式,具有独特的解码特性。然而,大多数现有的引导方法要么依赖辅助模型,要么专为自回归下一个词解码设计,难以应用于通过部分掩码序列的迭代去噪生成文本的扩散语言模型(DLM)。因此,我们提出DLM-SWAI,一种简单的无需训练的引导方法,通过使用预计算的词级风格分数在每个去噪步骤偏置词分布。在风格和安全控制任务上的实验表明,DLM-SWAI有效引导扩散语言模型,同时保持生成质量并需要最小的计算开销。消融实验进一步揭示了引导强度与流畅性之间的可控权衡,我们的分析将类别可引导性与词级属性线索的强度联系起来。

英文摘要

Steering language model generation toward desired textual properties is essential for practical deployment, and inference-time methods are particularly appealing because they enable controllable generation without retraining. Recent work has also highlighted diffusion language models as an emerging generation paradigm with distinct decoding properties. However, most existing steering approaches either rely on auxiliary models or are designed for autoregressive next-token decoding, making them difficult to apply to diffusion language models DLMs, which generate text through iterative denoising of partially masked sequences. Therefore, we propose DLM-SWAI, a simple training-free steering method that biases the token distribution at each denoising step using pre-computed token-level style scores. Experiments on style and safety control tasks show that DLM-SWAI effectively steers diffusion language models while preserving generation quality and requiring minimal computational overhead. Ablations further reveal a controllable trade-off between steering strength and fluency, and our analysis links class-wise steerability to the strength of token-level attribute cues.

2605.29625 2026-05-29 cs.AI

Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models

基于大语言模型的多智能体框架改进协作故事讲述

Arturo Valdivia, Paolo Burelli

AI总结 提出一种基于大语言模型的多智能体框架,通过迭代的写者-编辑者过程,在物理棋盘游戏中与儿童协作生成高质量故事。

详情
AI中文摘要

共同创作(即AI智能体与人类交互生成输出(如艺术))的话题近期获得了显著关注。然而,大多数研究关注数字环境中的成人-人类交互。本文探索了一种新颖的游戏式共同创作场景,涉及儿童和大语言模型(LLMs)通过物理棋盘游戏交互来创作书面故事。我们的目标是开发一个多智能体框架,能够生成适合年轻玩家的高质量叙事。我们方法的核心是一个迭代的写者-编辑者过程,其中一个LLM生成故事,另一个评估故事并提供改进反馈。通过涉及多个LLM的模拟研究,我们表明这种迭代交互在连续循环中持续提高了生成故事的感知质量。结果表明,在交互式故事讲述系统中,少量改进步骤可能足以实现高质量输出。

英文摘要

The topic of Co-creation, i.e., AI agents interacting with humans to generate outputs (e.g., art), has gained significant attention recently. However, most studies focus on adult-human interactions in a digital setting. This paper explores a novel ludic co-creation scenario involving children and Large Language Models (LLMs) interacting through a physical board game to create written stories. Our goal is to develop a multi-agent framework capable of producing high-quality narratives suitable for young players. At the core of our approach is an iterative Writer-Editor process in which one LLM generates stories while another evaluates them and provides feedback for refinement. Through a simulation study involving multiple LLMs, we show that this iterative interaction consistently improves the perceived quality of generated stories across successive loops. The results indicate that a small number of refinement steps may be sufficient to achieve high-quality outputs in interactive storytelling systems.