arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.23942 2026-05-26 cs.AI

LLM介导的普适系统中的权威倒置：当模型信任用户胜过传感器

Long Zhang, Zi-bo Qin, Wei-neng Chen

AI总结本研究揭示了大语言模型在融合传感器与用户冲突信息时，由于格式依赖性导致数值传感器数据被自然语言用户主张支配的权威倒置现象，并提出了几何框架、审计指标（CIR和AAI）以及推理时层干预方法（GAC）来诊断和缓解该问题。

详情

AI中文摘要

大语言模型（LLM）越来越多地融合普适系统中的异构输入。然而，当传感器测量值与用户主张冲突时，LLM如何隐式分配权威尚未被研究，这引发了在物理传感必须保持优先级的部署场景中的关键可靠性问题。与显式的传统融合不同，LLM将权威分配隐藏在学习的表示中。我们发现这种分配严重依赖于格式：数值传感器数据未能整合到与答案相关的模型方向中，使得自然语言主张主导最终决策，我们将这种现象称为 extbf{权威倒置}。为了诊断和缓解这一问题，我们开发了一个上下文整合的几何框架，引入了两个可计算的审计指标，即上下文整合比（CIR）和权威对齐指数（AAI），并提出了几何权威校准（GAC），一种推理时的层级干预方法，以抑制错位的用户权威。在四个数据集（共576个冲突实例）上评估四个模型（参数规模4B至35B，三种架构），揭示了极端的倒置：在数值任务上，模型表现出接近零的传感器信任（AAI = -0.805，Cohen's d = -2.14），且不受模型容量影响。验证我们的几何框架，理论引导的因果注入翻转了80.2%的错误决策（随机对照<0.4%）。实际应用中，GAC将HAR准确率从0–1.6%提升至21.9–27.5%，优于提示基线。最终，LLM介导系统中的权威分配必须被显式审计并根据应用特定配置，而不是保持隐式。

英文摘要

Large language models (LLMs) increasingly fuse heterogeneous inputs in ubiquitous systems. Yet, how LLMs implicitly allocate authority when sensor measurements and user claims conflict remains unexamined, raising critical reliability concerns for deployments where physical sensing must retain priority. Unlike explicit traditional fusion, LLMs bury authority allocation within learned representations. We discover this allocation is severely format-dependent: numerical sensor data fails to integrate into answer-relevant model directions, allowing natural-language claims to dominate the final decision, a phenomenon we term \textbf{Authority Inversion}.To diagnose and mitigate this, we develop a geometric framework of context integration, introduce two computable audit metrics, specifically the Context Integration Ratio (CIR) and Authority Alignment Index (AAI), and propose Geometric Authority Calibration (GAC), an inference-time layer-level intervention to suppress misplaced user authority. Evaluating four models (4B to 35B parameters, three architectures) across four datasets totaling 576 conflict instances reveals extreme inversion: on numerical tasks, models exhibit near-zero sensor trust (AAI = -0.805, Cohen's d = -2.14), unaffected by model capacity. Validating our geometric framework, theory-guided causal injection flips 80.2\% of incorrect decisions (vs. <0.4\% for random controls). Practically, GAC improves HAR accuracy from 0 -- 1.6\% to 21.9 -- 27.5\%, outperforming prompting baselines. Ultimately, authority allocation in LLM-mediated systems must be explicitly audited and application-specifically configured rather than left implicit.

URL PDF HTML ☆

赞 0 踩 0

2605.23936 2026-05-26 cs.AI cs.LG

Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications

模糊、中智和不确定图论：性质与应用

Takaaki Fujita, Florentin Smarandache

AI总结本书系统综述了不确定性下的图论，以不确定图框架为核心，统一了模糊、中智等模型，并介绍了扩展图类及其在分子图、决策系统、图神经网络等领域的应用。

Comments 326 pages. Publisher: Neutrosophic Science International Association (NSIA) Publishing House. ISBN: 978-197250204-4

2605.23935 2026-05-26 cs.AI cs.CY cs.MA cs.SE cs.SY eess.SY

Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems

操作化重构权威：自主智能体系统中的运行时构建、依赖解析与执行门控

Marcelo Fernandez - TraslaIA

AI总结本文提出一种运行时执行模型，通过动态依赖解析和恢复循环，确保动作仅在当前状态可构建权威时执行，从而保证安全性和条件活性。

Comments Agent Governance Series, Paper P6. Companion papers on arXiv: P0 (2604.17511), P1 (2603.18829), P2 (2604.17517). P3/4 and P5 submitted concurrently (pending arXiv IDs). Zenodo: 10.5281/zenodo.19699460

详情

DOI: 10.5281/zenodo.19699460

AI中文摘要

自主智能体系统的失败不仅源于错误决策，还源于执行那些在运行时其权威不再成立的决策。先前的工作将重构权威（RAM）定义为有效执行的条件：仅当权威能从当前状态构建时，才允许执行动作。本文关注运行时强制执行问题：如何在运行系统中强制执行该条件。我们引入一种运行时执行模型，其中权威在动作时被评估，执行取决于其可构建性。这将执行状态空间从允许/拒绝扩展到第三种状态——暂停，表示由于不完整或不确定的可观测性导致权威未定义。我们定义了一个具体的执行协议，包括动态依赖解析、权威重构和显式决策语义。我们进一步引入一个恢复循环，将漂移检测（IML）与执行控制（ACP）集成，允许系统暂停执行、获取缺失信息并重新尝试权威重构。我们证明该模型保证了安全性——没有动作会在没有可构建权威的情况下执行——以及条件活性：当定义权威的变量变得可观测时，执行恢复。这项工作将重构权威操作化为一种运行时强制机制，提供了在真实系统中应用RAM所需的执行语义。

英文摘要

Autonomous agent systems fail not only due to incorrect decisions, but due to executing decisions whose authority no longer holds at runtime. Prior work defined Reconstructive Authority (RAM) as a condition for valid execution: actions are permitted only if authority can be constructed from current state. This paper addresses enforcement at runtime: how to enforce this condition in a running system. We introduce a runtime execution model in which authority is evaluated at action time and execution is conditioned on its constructibility. This extends the execution state space beyond admit/deny with a third state, halt, representing cases where authority is undefined due to incomplete or uncertain observability. We define a concrete execution protocol including dynamic dependency resolution, authority reconstruction, and explicit decision semantics. We further introduce a Recovery Loop that integrates drift detection (IML) with execution control (ACP), allowing the system to suspend execution, acquire missing information, and re-attempt authority reconstruction. We show that this model guarantees safety -- no action is executed without constructible authority -- and conditional liveness: execution resumes when authority-defining variables become observable. This work operationalizes reconstructive authority as a runtime enforcement mechanism, providing the execution semantics required to apply RAM in real systems.

URL PDF HTML ☆

赞 0 踩 0

2605.23934 2026-05-26 cs.AI quant-ph

Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model

实用量子CIM赋能：基于全自主核心智能体大模型

Wang Rui, Lu Diannan

AI总结本研究将飞秒激光泵浦的相干伊辛机与LLM驱动的智能体系统结合，实现QUBO/Ising模型校准、约束权重决策迭代和文献方案快速验证，并完全基于国产大模型和硬件完成，同时发现智能体辅助量子计算迭代可反向增强智能体问题解决能力的新范式。

Comments 21 pages 7 figures

详情

AI中文摘要

量子计算设备被认为是解决NP完全问题的强大工具。然而，其建模的复杂性给非专业人士带来了显著障碍，而约束权重和建模方法的繁琐迭代也消耗了专家的大量精力。为应对这些挑战，本研究通过利用LangGraph和LangChain框架，将飞秒激光泵浦的相干伊辛机（CIM）与LLM驱动的智能体系统集成。综合研究表明，大语言模型（LLMs）可以有效执行建模任务，如QUBO/Ising模型校准、约束权重决策迭代以及文献报道方案的快速验证。值得注意的是，所有这些任务都可以完全基于国产大模型实现，结合国内开发的CIM硬件，我们真正实现了完全依赖全自主智能体大模型和硬件的实用量子CIM赋能。这项工作成功实现了稳健的技术集成，为后续研究奠定了坚实基础。然而，它也指出了当前阶段大模型和量子计算这两个前沿领域持续存在的挑战。令人鼓舞的是，我们意外发现了一种有前景的新范式，其中智能体辅助的量子计算迭代积累的知识反向增强了智能体自身的问题解决能力，从而应对这些挑战。

英文摘要

Quantum computing devices are recognized as powerful tools for solving NP-complete problems. However, the intricacy of their modeling presents notable barriers for non-specialists, while the tedious iteration of constraint weights and modeling methodologies also consumes substantial effort on the part of experts. To address these challenges, this study integrates a femtosecond laser-pumped Coherent Ising Machine (CIM) with an LLM-driven agentic system by leveraging the LangGraph and LangChain frameworks. Comprehensive investigations demonstrate that large language models (LLMs) can effectively perform such tasks in modeling as QUBO/Ising model calibration, constraint weight decision iteration and rapid validation of literature-reported schemes. Notably, all these tasks can be fully implemented based on domestic large models, combined with domestically developed CIM hardware, we truly achieve the practical empowerment of quantum CIM that fully relies on all-domestic agentic large models and hardware. This work successfully realizes robust technological integration, laying a solid foundation for subsequent research. Nevertheless, it also identifies the persisting challenges in the two cutting-edge fields of large models and quantum computing at the current stage. Encouragingly, we unexpectedly discover a promising new paradigm where accumulated knowledge from agent-assisted quantum computing iterations reciprocally enhances the agent's own problem-solving capability, thereby addressing these challenges.

URL PDF HTML ☆

赞 0 踩 0

2605.23932 2026-05-26 cs.AI cs.CL cs.CY cs.LG

When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

当正确信念崩溃：LLMs在临床压力下的认知韧性

Boyu Xiao, Xiuqi Tian, Xuwen Song, Haochun Wang, Guanchun Song, Sendong Zhao, Bing Qin

AI总结研究LLMs在临床对话中面对逐步升级压力时信念稳定性问题，提出Med-Stress压力测试框架，发现知识-韧性差距，并设计RBED和R-FT方法提升鲁棒性。

Comments ACL 2026

详情

AI中文摘要

尽管在医学基准测试中准确率很高，但LLMs在临床对话中可能表现出严重的多轮谄媚行为，在逐步升级的压力下放弃最初正确的诊断。我们提出了\textbf{\textsc{Med-Stress}}，一个针对性的压力测试框架，用于评估在逐步升级压力下的信念稳定性。在九个前沿大型语言模型（LLMs）中，我们发现医学知识与鲁棒性之间存在明显的分离：高初始诊断能力并不意味着高信念稳定性，导致多个LLMs存在较大的知识-鲁棒性差距。为了缓解这种失败模式，我们提出了一种轻量级的推理时防御方法\textbf{\texttt{RBED}}（\textbf{R}ole-\textbf{B}ased \textbf{E}pistemic \textbf{D}efense），以及一种训练时方法\textbf{\texttt{R-FT}}（\textbf{R}esilience-oriented \textbf{F}ine-\textbf{T}uning），该方法内化了基于证据的抗压能力。实验表明，\textbf{\texttt{R-FT}}几乎消除了信念变化，并显著提高了鲁棒性。

英文摘要

Despite strong medical benchmark accuracy, LLMs can exhibit severe multi-turn sycophancy in clinical dialogue, abandoning initial correct diagnosis under escalating pressure. We propose \textbf{\textsc{Med-Stress}}, a targeted stress test framework that evaluates belief stability under escalating pressure. Across nine frontier large language models (LLMs), we find a clear dissociation between medical knowledge and robustness: high initial diagnostic capability does not imply high belief stability, yielding large knowledge-robustness gaps for several LLMs. To mitigate this failure mode, we propose a lightweight inference-time defense, \textbf{\texttt{RBED}} (\textbf{R}ole-\textbf{B}ased \textbf{E}pistemic \textbf{D}efense), and \textbf{\texttt{R-FT}} (\textbf{R}esilience-oriented \textbf{F}ine-\textbf{T}uning), a training-time approach that internalizes evidence-based resistance to pressure. Experiments show that \textbf{\texttt{R-FT}} nearly eliminates belief change and substantially improves robustness.

URL PDF HTML ☆

赞 0 踩 0

2605.23931 2026-05-26 cs.AI cs.PL cs.SE

BODHI: Precise OS Kernel Specification Inference

BODHI：精确的操作系统内核规范推断

Zhiming Chang, Ziyang Li

AI总结提出一种领域知识提示方法BODHI，通过结构化C到Python翻译指南增强少样本提示，在OSV-Bench基准上将Pass@1从55.10%提升至96.73%，缩小了通用代码生成与形式规范合成之间的差距。

详情

AI中文摘要

操作系统内核的形式化验证需要精确的规范来捕获系统调用的预期行为。手动编写这些规范需要深厚的领域专业知识，这促使使用大型语言模型（LLM）来自动化该过程。然而，在OSV-Bench（一个源自Hyperkernel操作系统内核的245个规范生成任务基准）中，最佳报告的Pass@1为55.10%。我们提出了一种领域知识提示方法（BODHI），该方法通过一个涵盖15类领域特定翻译模式的结构化C到Python翻译指南来增强标准的少样本提示。受结构化思维链（SCoT）提示的启发，该指南通过关注点分离来组织翻译，将前置条件提取和后置条件生成作为不同的类别处理。在来自六个提供商（Anthropic、Mistral、Amazon、DeepSeek、Meta、Alibaba）的九个模型上进行了评估，涵盖了密集、混合专家和推理架构，BODHI改进了所有测试的模型，提升幅度从+11%到+32%。最佳配置（Claude Opus 4.6 + BODHI）达到了96.73%的Pass@1。BODHI减少了语法和语义错误，对具有足够指令跟随能力以利用结构化参考材料的模型效果最强。这些结果表明，领域知识注入是一种与模型无关的技术，显著缩小了通用代码生成与形式规范合成之间的差距。

英文摘要

The formal verification of operating system kernels requires precise specifications that capture the intended behavior of system calls. Writing these specifications manually demands deep domain expertise, motivating the use of large language models (LLMs) to automate the process. However, in OSV-Bench, a benchmark of 245 specification generation tasks derived from the Hyperkernel OS kernel, the best reported Pass@1 is 55.10%. We propose a domain knowledge prompting method (BODHI), which augments the standard few-shot prompt with a structured C-to-Python translation guide covering 15 categories of domain-specific translation patterns. Inspired by Structured Chain-of-Thought (SCoT) prompting, the guide organizes translation by separation of concerns, addressing pre-condition extraction and post-condition generation as distinct categories. Evaluated on nine models from six providers (Anthropic, Mistral, Amazon, DeepSeek, Meta, Alibaba), covering dense, mixture-of-experts and reasoning architectures, BODHI improves every model tested, with gains ranging from +11% to +32%. The best configuration (Claude Opus 4.6 + BODHI) reaches 96.73% Pass@1. BODHI reduces both syntax and semantic errors, with the strongest effect on models that have sufficient instruction-following capability to utilize structured reference material. These results demonstrate that domain knowledge injection is a model-agnostic technique that substantially bridges the gap between general-purpose code generation and formal specification synthesis.

URL PDF HTML ☆

赞 0 踩 0

2605.23930 2026-05-26 cs.AI cs.LG cs.MA

Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game

量子青蛙：量化时间合作博弈中的涌现合作与难度缩放

Saad Mankarious

AI总结通过强化学习分析量化时间合作博弈Quantum Frog，发现同步冲刺策略最优，合作训练可大幅提升成功率并缩短回合步数。

详情

AI中文摘要

我们引入了\emph{Quantum Frog}，这是一个双人合作游戏，基于一种新颖的\emph{量化时间}机制，其中环境仅在玩家行动时推进。受经典街机游戏Frogger启发，Quantum Frog要求两只青蛙穿越一个8×8的交通网格并一起到达远端。我们使用强化学习（RL）作为分析镜头来回答四个设计问题：（1）游戏难度如何随交通密度缩放，（2）最优单智能体策略是什么以及为什么，（3）独立和合作双智能体游戏之间的合作差距有多大，以及（4）当智能体被激励合作时会出现什么联合策略？我们通过五个升级阶段训练智能体：表格Q学习、深度Q网络（\DQN）、独立\DQN~（\IDQN）和多智能体近端策略优化（\MAPPO\ 带有集中式评论家），针对一到六辆车的交通密度进行评估。我们的主要发现是：（i）量化时间机制使得\emph{冲刺策略}（每一步直接向上移动）普遍最优，因为暴露于交通的时间被最小化；（ii）添加一个不协调的第二玩家比将单个专家玩家的交通量增加六倍更难；（iii）合作训练相对于独立智能体将联合成功率提高了+32–34个百分点，并将回合长度从约90步减少到约6步；（iv）涌现的合作策略是同步冲刺，而不是复杂的位置协调，这表明在时间关键的合作任务中，仅共享激励就足以使智能体对齐。这些发现为Quantum Frog的商业设计提供了具体、经验基础的指导，并为环境机制在塑造多智能体学习动态中的作用提供了更广泛的见解。

英文摘要

We introduce \emph{Quantum Frog}, a two-player cooperative game built on a novel \emph{quantized-time} mechanic in which the environment advances only when a player acts. Inspired by the classic arcade game Frogger, Quantum Frog requires two frogs to cross an 8$\times$8 grid of traffic and reach the far side together. We use reinforcement learning (RL) as an analytical lens to answer four design questions: (1) how does game difficulty scale with traffic density, (2) what is the optimal single-agent policy and why, (3) how large is the cooperation gap between independent and cooperative two-agent play, and (4) what joint strategy emerges when agents are incentivised to cooperate? We train agents through five escalating stages, Tabular Q-Learning, Deep Q-Network (\DQN), Independent \DQN~(\IDQN), and Multi-Agent Proximal Policy Optimisation (\MAPPO\ with a centralised critic), evaluating each against traffic densities of one to six cars. Our key findings are: (i) the quantized-time mechanic makes a \emph{rush strategy} (moving directly upward at every step) universally optimal, as time exposure to traffic is minimised; (ii) adding an uncoordinated second player is harder than sextupling the traffic for a single expert player; (iii) cooperative training recovers +32--34 percentage points of joint success rate relative to independent agents and reduces episode length from $\sim$90 to $\sim$6 steps; and (iv) the emergent cooperative strategy is synchronised rushing, not complex positional coordination, illustrating that shared incentives alone suffice to align agents in time-critical cooperative tasks. These findings provide concrete, empirically grounded guidance for the commercial design of Quantum Frog and offer broader insights into the role of environment mechanics in shaping multi-agent learning dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.23929 2026-05-26 cs.AI cs.SE

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

面向LLM驱动的智能体工作流的可靠设计：优化延迟-可靠性-成本权衡

Ya-Ting Yang, Quanyan Zhu

AI总结本文通过引入参数化指数可靠性函数建模LLM与非LLM智能体的性能，提出水填充令牌分配策略，并刻画最优工作流可靠性的影子价格，以解决延迟、可靠性和成本之间的权衡问题。

2605.23928 2026-05-26 cs.AI cs.CL cs.DC cs.MA cs.PL cs.SE

Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction

Context: 通过可组合沙盒程序、声明式连接和结构化交互实现主动目标导向智能

Gregory Magarshak

AI总结提出Context架构，通过可组合沙盒程序、声明式连接和结构化交互实现主动目标导向智能，并证明其在成本、正确性和效率上的优势。

Comments 7 pages; third in a series with arXiv:2501.XXXXX (Magarshak Machine / SPACER) and arXiv:2502.XXXXX (Grokers)

详情

AI中文摘要

我们提出Context，Magarshak架构的智能层，用主动目标导向智能体取代被动查询-响应聊天机器人，无需等待用户提示即可推进共享任务。该架构基于三个相互增强的机制。编写时上下文组装通过Groker智能体预计算丰富的类型化属性，将交互上下文组装为图状态的确定性纯函数；上下文块在语义变化之间的轮次中字节相同，实现近100%的KV缓存重用。可组合沙盒智慧程序形成一个受管理的库，包含LM生成的命令式程序，通过类型化流关系声明式连接到目标类型，通过阶段排序组合，并在交互时执行而无需进一步调用LM。主动目标流状态机通过检查图状态并发出结构化交互内容（选项数组、治理功能、澄清提示）驱动对话走向终止状态，无需等待用户输入。我们证明了六个形式化结果：上下文稳定性定理，将每轮LM成本限制为语义变化率的函数；程序组合正确性定理；声明式连接正确性定理；主动主导定理，证明主动智能体在期望轮数到终止状态上弱主导被动智能体；协调开销消除与质量保持，建立多方目标聊天中的帕累托改进；以及跨平台投票一致性定理。已在开源Qbix/Safebox/Safebots栈中实现。

英文摘要

We present Context, the intelligence layer of the Magarshak Architecture, which replaces reactive query-response chatbots with proactive goal-directed agents that advance shared tasks without waiting for user prompts. The architecture rests on three mutually reinforcing mechanisms. Write-time context assembly precomputes enriched typed attributes via Groker agents, assembling interaction context as a deterministic pure function of graph state; context blocks are byte-identical across turns between semantic changes, enabling near-100% KV-cache reuse. Composable sandboxed wisdom programs form a governed library of LM-generated imperative programs declaratively wired to goal types via typed stream relations, composed via phase ordering, and executed at interaction time without further LM calls. Proactive goal stream state machines drive conversations toward terminal states by inspecting graph state and emitting structured interaction content (option arrays, governance affordances, clarification prompts) without awaiting user input. We prove six formal results: the Context Stability Theorem, bounding per-turn LM cost as a function of semantic change rate; a Program Composition Correctness Theorem; a Declarative Wiring Soundness Theorem; the Proactive Dominance Theorem, proving proactive agents weakly dominate reactive agents on expected turns-to-terminal-state; Coordination Overhead Elimination and Quality Preservation, establishing Pareto improvements in multi-participant goal chats; and a Cross-Platform Vote Consistency Theorem. Implemented in the open-source Qbix / Safebox / Safebots stack.

URL PDF HTML ☆

赞 0 踩 0

2605.23926 2026-05-26 cs.AI cs.LG

Raon-Speech 技术报告

Beomsoo Kim, Changho Choi, Dohyun Kim, Dongki Lee, Ethan Ewer, Eunchong Kim, Gyeongman Kim, Haechan Kim, Hyeonghwan Kim, Inkyu Park, Jihun Yun, Jihwan Moon, Jiyun Kim, Joonghyun Bae, Junhyuck Kim, Minkyu Kim, Sehun Lee, Seungjun Chung, Sungwoo Cho, Dongmin Park, Dongwon Kim, Hara Kang, Jonghyun Lee, Keon Lee, Kangwook Lee, Jaewoong Cho

AI总结本文提出 Raon-Speech，一个 9B 参数的语音语言模型，通过多阶段训练实现英语和韩语的语音理解、回答与生成，并扩展为全双工对话模型 Raon-SpeechChat，在语音任务上超越同类模型。

详情

词汇语义的场景抽象：情境意义的结构化表示

Yejin Cho, Katrin Erk

AI总结提出场景抽象框架，通过少样本提示大语言模型构建词汇使用情境的结构化表示，实验证明场景可可靠识别且优于基线方法。

详情

AI中文摘要

咖啡和茶共享许多属性，但它们唤起截然不同的情境、氛围和情感联想。这些词汇意义的情境维度是真实且系统的，但在大多数词汇意义的计算表示中仍然隐含。我们提出场景抽象，一个构建词汇在不同使用语境中参与的解释性场景的结构化表示框架。每个场景由情境场景（事件、实体、设置）和以表达为中心的表达轮廓（参与事件、可概括属性、唤起情感）组成，通过大语言模型的少样本提示实现。我们的贡献有三方面：（1）情境词汇意义的结构化表示框架；（2）COCA-Scenes，一个包含26个关键词的520个使用实例的数据集，用于区分场景识别；（3）来自两个实验的经验证据表明，场景在人类观察者中可靠识别（准确率82.4%，比纯文本嵌入高11.8个百分点），并且我们的场景轮廓比基于ATOMIC的替代方案更符合人类对语境中词汇的解释（在三个语义维度上偏好86.4%）。

英文摘要

Coffee and tea share many properties, yet they evoke strikingly different situations, atmospheres, and affective associations. These situated dimensions of word meaning are real and systematic, but they remain implicit in most computational representations of lexical meaning. We propose Scene Abstraction, a framework for constructing structured representations of the interpretive scenes that words participate in across usage contexts. Each scene consists of a Contextual Scene (Events, Entities, Setting) and an expression-centered Expression Profile (Engaged events, Generalizable properties, Evoked emotions), operationalized through few-shot prompting of a large language model. Our contributions are three-fold: (1) a structured representation framework for situated lexical meaning; (2) COCA-Scenes, a dataset of 520 usage instances across 26 keywords for distinct scene identification; and (3) empirical evidence from two experiments suggesting that scenes are reliably identifiable across human observers (82.4% accuracy, +11.8 pp over text-only embeddings) and that our scene profiles more closely align with human interpretation of words in context than ATOMIC-based alternatives (86.4% preference across three semantic dimensions).

URL PDF HTML ☆

赞 0 踩 0

2605.22532 2026-05-26 cs.LG

Relational Linear Properties in Language Models: An Empirical Investigation

语言模型中的关系线性性质：一项实证研究

Giovanni Valer, Luigi Gresele, Marco Bronzini, Emanuele Marconato

AI总结本文提出基于KL散度的探针方法，实证检验语言模型中关系线性假设（即固定关系下对象解嵌入可由主体嵌入线性映射预测），发现其随模型、层和关系表述变化。

详情

AI中文摘要

线性性质在语言模型的表示中普遍存在；然而，实验性地测试它们仍然是一项具有挑战性的任务。本文聚焦于关系线性：即对于固定关系（例如“演奏”），对象的解嵌入（例如“小号”）可以通过线性映射从其主体（例如“迈尔斯·戴维斯”）的嵌入预测。我们提出了一种实验方法，用于测试Marconato等人（2025）提出的关系线性公式。具体而言，我们引入了一种基于KL散度的探针方法来评估这一性质，并考察其在不同层和不同表述的关系查询中的变化。该方法也比先前工作更高效；例如，它避免了Hernandez等人（2024）在线性关系嵌入中使用的粗略雅可比近似。我们在四个数据集上的发现表明，关系线性在不同模型间存在差异，展现出与先前关于模型表示中语言信息的观察一致的逐层模式，并且受关系表述方式变化的影响不同。

英文摘要

Linear properties are ubiquitous in the representations of language models; however, testing them experimentally remains a challenging task. This work focuses on relational linearity: the hypothesis that, for a fixed relation (e.g., "plays"), the unembedding of an object (e.g., "trumpet") can be predicted from the embedding of its subject (e.g.,"Miles Davis") by a linear map. We present an experimental method to test the formulation of relational linearity by Marconato et al. (2025). Specifically, we introduce a probing method, based on Kullback-Leibler divergence, to evaluate this property and examine its variation across layers and paraphrased relational queries. It is also more efficient than previous work; for example, it avoids the crude Jacobian approximations used in Linear Relational Embeddings by Hernandez et al. (2024). Our findings across four datasets show that relational linearity varies across models, exhibits layer-wise patterns consistent with prior observations about linguistic information in model representations, and is differently affected by changes in how the relation is phrased.

URL PDF HTML ☆

赞 0 踩 0

2605.22093 2026-05-26 cs.AI

Knowledge Graph Re-engineering Along the Ontological Continuum (extended version)

知识图谱沿本体论连续体的重工程（扩展版）

Enrico Daga, Valentina Tamma, Terry Payne

AI总结本文提出本体论连续体作为概念框架，通过语义与语用、属性与可供性两个正交维度描述、比较和转换知识图谱，以解决不同建模实践间的集成与重用问题，并通过案例研究验证其有效性。

详情

AI中文摘要

知识图谱已成为数据集成的主要载体，对现代AI的成功至关重要，但KG建模实践的多样性（从轻量级词汇表到丰富公理化的本体论）使得集成和重用成本高昂且脆弱。这一挑战在神经符号AI中尤为突出，其中桥接神经和符号组件依赖于重新设计KG以适应新需求的能力；生成式AI现在提供了前所未有的自动化能力，但如果没有对KG空间的原则性理解，这种自动化在概念上仍然缺乏基础。我们将本体论连续体引入为缺失的概念化，这是一个理论构造，其特征框架由两个正交区分定义：语义与语用，以及属性与可供性；这些共同定义了一个词汇表，用于描述、比较、导航和转换跨越全部建模实践的KG。方法论立场是经验性的：连续体并非规定KG应如何建模，而是旨在定义一种存在理论，源于对现实世界KG工程实践的观察，其结构可以形式化地明确表达，例如通过形式概念分析（FCA）。我们通过一个关于溯源知识的案例研究来夯实这一愿景，展示单一关注点如何在连续体上以不同方式体现。我们阐述了五个开放的研究挑战，并邀请社区将本体论连续体发展为一个共享的研究议程。

英文摘要

Knowledge graphs have become the primary vehicle for data integration and are critical to the success of modern AI, but the diversity of KG modelling practices, from lightweight vocabularies to richly axiomatised ontologies, makes integration and reuse expensive and brittle. This challenge is particularly acute in neuro-symbolic AI, where bridging neural and symbolic components depends on the ability to reengineer KGs to fit new requirements; GenAI now offers unprecedented automation capability, but without a principled understanding of the KG space, such automation remains conceptually ungrounded. We introduce the ontological continuum as that missing conceptualisation, a theoretical construct a theoretical construct whose characterisation framework is defined by two orthogonal distinctions: semantics vs pragmatics, and properties vs affordances; together these define a vocabulary to describe, compare, navigate, and transform KGs across the full range of modelling practices. The methodological stance is empirical: rather than prescribing how KGs should be modelled, the continuum aims to define a theory of the existent, derived from observation of real-world KG engineering practices and whose structure can be made formally explicit, for example, through Formal Concept Analysis (FCA). We ground the vision through a case study on provenance knowledge, showing how a single concern manifests differently across the continuum. We articulate five open research challenges and invite the community to develop the ontological continuum as a shared research agenda.

URL PDF HTML ☆

赞 0 踩 0

2605.22005 2026-05-26 cs.LG cs.AI cs.CL

Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)

检查你的大语言模型的秘密词典！五行代码揭示你的大语言模型学到了什么（包括它不应该学到的）

Hisashi Miyashita

AI总结通过对lm_head权重矩阵进行奇异值分解（仅需五行PyTorch代码且无需模型推理），直接从模型权重中揭示可解释的语义子空间，并发现模型训练数据组成和策展哲学。

详情

AI中文摘要

我们展示了基于Transformer的大语言模型的lm_head权重矩阵的奇异值分解——仅需五行PyTorch代码且无需模型推理——直接从模型权重中揭示可解释的语义子空间。每个左奇异向量识别出当隐藏状态与相应奇异方向对齐时最容易被选中的词汇标记；检查这些聚类揭示了模型的训练数据组成和策展哲学。分析GPT-OSS-120B、Gemma-2-2B和Qwen2.5-1.5B，我们发现奇异值谱和词汇聚类结构在不同模型间存在系统性差异：GPT呈现出功能分化子空间的渐进层次；Gemma以19世纪前的英语正字法为主，形成阶梯式聚类结构，这可能有助于高输出可控性；Qwen展现出广泛的多语言覆盖，同时其子空间的词汇被作者认为在伦理上不适合直接发表。基础-指令对比表明，伦理上令人担忧的子空间源自预训练，并且不会被后训练对齐移除。我们引入词汇聚类得分（VCS）来量化子空间一致性，以及加权投影得分（WPS）作为静态故障标记检测器；将WPS应用于GPT-OSS-120B，无需任何模型推理即可恢复shokubutsu-hyakka-tsu（ID 137606），这是CJK语言社区中广泛报道的一个著名故障标记。我们提出了问题词汇内容根本原因的分类法，并呼吁将lm_head SVD分析作为标准发布前安全审计步骤。我们的发现进一步指出了SVD引导的分词器优化和更可控的大语言模型设计方向。

英文摘要

We show that singular value decomposition of the lm_head} weight matrix of a transformer-based large language model -- requiring only five lines of PyTorch and no model inference -- reveals interpretable semantic subspaces directly from the model weights. Each left singular vector identifies the vocabulary tokens most readily selected when the hidden state aligns with the corresponding singular direction; inspecting these clusters exposes the model's training data composition and curation philosophy. Analysing GPT-OSS-120B, Gemma-2-2B, and Qwen2.5-1.5B, we find that singular value spectra and vocabulary cluster structures differ systematically across models: GPT exhibits a graduated hierarchy of functionally differentiated subspaces; Gemma is dominated by pre-nineteenth-century English orthography, forming a stepwise clustering structure that may contribute to high output controllability; and Qwen exhibits broad multilingual coverage alongside subspaces whose vocabulary the authors have determined to be ethically inappropriate for direct publication. Base-instruct comparison reveals that ethically concerning subspaces originate in pretraining and are not removed by post-training alignment. We introduce the Vocabulary Cluster Score (VCS) to quantify subspace coherence, and the Weighted Projection Score (WPS) as a static glitch token detector; applying WPS to GPT-OSS-120B recovers shokubutsu-hyakka-tsu (ID 137606), a well-known glitch token widely reported in the CJK language community, without any model inference. We propose a taxonomy of root causes for problematic vocabulary content and call for lm_head} SVD analysis to be adopted as a standard pre-release safety auditing step. Our findings further suggest directions toward SVD-guided tokenizer optimisation and more controllable LLM design.

URL PDF HTML ☆

赞 0 踩 0

2605.20670 2026-05-26 cs.LG

深度神经层扩散

Rémi Bourgerie, Šarūnas Girdzijauskas, Viktoria Fodor

AI总结针对图神经网络深层堆叠导致表示崩溃的问题，提出用层邻接算子替代层拉普拉斯算子，结合归一化、奇非线性函数和门控机制，在合成和真实数据集上显著提升深层网络性能。

Comments Accepted at the ICML 2026 Workshop on Graph Foundation Models (GFM@ICML 2026). Code available at https://github.com/remibourgerie/deep-neural-sheaf-diffusion

详情

AI中文摘要

深度图神经网络对于捕捉图结构数据中的复杂依赖关系至关重要。然而，将GNN扩展到深层仍具挑战性，因为堆叠层会导致表示崩溃和由于重复聚合导致的敏感性降低。虽然神经层扩散（NSD）提供了针对这种崩溃的强理论保证，但这些保证在实践中并未实现：随着深度增加，层拉普拉斯算子的不一致信号消失，限制了更深层的贡献。我们识别了阻碍NSD在深度上有效性的机制，并提出了深度神经层扩散（DNSD），它用层邻接算子替换层拉普拉斯算子，以在层间保持信息信号。这辅以归一化、奇非线性函数和门控。为了对预期性能改进提供原则性解释，我们将层扩散与图注意力机制进行对比，强调DNSD用矩阵值边函数替换标量注意力分数，并归一化节点表示而非注意力分数。我们通过实验证明，DNSD在图任务中有效利用深层聚合，在合成长程数据集上以高达30个百分点的准确率优于GNN和NSD基线，并在真实世界基准上持续优于它们。这些结果将基于层的架构定位为图基础模型的有前途的构建块，通过支持有效的深层架构。

英文摘要

Deep Graph Neural Networks (GNNs) are essential for capturing complex dependencies in graph-structured data. However, scaling GNNs to depth remains challenging, as stacking layers leads to representation collapse and diminishing sensitivity due to repeated aggregation. While Neural Sheaf Diffusion (NSD) provides strong theoretical guarantees against such collapse, these guarantees do not translate to practice: as depth increases, the disagreement signal of the sheaf Laplacian vanishes, limiting the contribution of deeper layers. We identify mechanisms that hinder NSD effectiveness at depth and propose \emph{Deep Neural Sheaf Diffusion} (DNSD), which replaces the sheaf Laplacian with a sheaf adjacency operator to maintain informative signals across layers. This is complemented by normalization, odd nonlinearities, and gating. To provide a principled explanation of the expected performance improvement, we contrast sheaf diffusion to graph attention mechanisms, highlighting that DNSD replaces scalar attention scores with matrix-valued edge functions and normalizes node representations rather than attention scores. We demonstrate empirically that DNSD effectively utilizes deep aggregation in graph tasks, outperforming GNN and NSD baselines with up to 30pp accuracy on synthetic long-range datasets, and consistently outperforming them on real-world benchmarks. These results position sheaf-based architectures as a promising building block for graph foundation models by supporting effective deep architectures.

URL PDF HTML ☆

赞 0 踩 0