arXivDaily arXiv每日学术速递 周一至周五更新
2606.20351 2026-06-19 cs.LO cs.PL 新提交

A cubical formalisation of conditional independence, Bayesian conditioning, and Pearl's d-separation soundness

条件独立性、贝叶斯条件化和Pearl的d-分离正确性的立方形式化

Karen Sargsyan

AI总结 本文在Cubical Agda中形式化概率单子,提出一种高阶归纳类型,通过引入贝叶斯公式修正标准凸代数交换公理的不足,并验证了半图oid公理、do-演算规则和d-分离定理。

详情
AI中文摘要

自Stone以来概率单子形式化中常见的标准凸代数交换公理被证明不足以支持完整的贝叶斯条件化。我们在Cubical Agda中精确地说明了这一点:有限分布作为高阶归纳类型,条件独立性作为核之间的立方路径,递归贝叶斯条件化作为全支撑片段上的全函数。将条件化提升到完整的HIT暴露了一个结构上的不匹配——重新排列的四叶混合的两半携带由贝叶斯公式关联的不同贝叶斯权重,而不是标准公理提供的单个共享内部权重。我们展示了解决这一问题的最小推广,并证明了标准形式是退化情况,其中两个内部权重重合。基于这一观察,我们在抽象有序域接口之上以构造性方式验证了代数上下文,无需任何公设:绑定交换性、四个半图oid公理、交(通过结构性Σ-见证简化为收缩,无需正性)、Pearl的do-演算规则1、2和3的核形式、有限类型贝叶斯条件化,以及任意n顶点有限有向无环图(DAG)上干预和贝叶斯形式的Pearl的d-分离定理(正确性)。概率单子也被验证为马尔可夫范畴;抽象接口在Q处实现。

英文摘要

The standard convex-algebra interchange axiom, common to probability-monad formalisations since Stone, is provably too weak to support full Bayesian conditioning. We make this precise in Cubical Agda: finite distributions as a higher inductive type, conditional independence as a cubical path between kernels, recursive Bayesian conditioning as a total function on a full-support fragment. Lifting conditioning to the full HIT exposes a structural mismatch -- the two halves of the rearranged 4-leaf mix carry distinct Bayesian weights related by Bayes' formula, not the single shared inner weight the standard axiom provides. We exhibit the minimal generalisation that resolves this and prove the standard form is the degenerate case where the two inner weights coincide. Around this observation we verify the algebraic context constructively, with zero postulates above an abstract ordered-field interface: bind commutativity, the four semi-graphoid axioms, intersection (reduced to contraction via structural $Σ$-witnesses, without positivity), Pearl's do-calculus Rules~1, 2, and~3 in kernel form, finite-type Bayesian conditioning, and Pearl's d-separation theorem (soundness) on arbitrary $n$-vertex finite directed acyclic graphs (DAGs) in both interventional and Bayesian forms. The probability monad is also verified as a Markov category; the abstract interface discharges at $Q$.

2606.20134 2026-06-19 cs.LO cs.PL 新提交

An MSO Framework for Weak-Memory Verification and Robustness

弱内存验证与鲁棒性的MSO框架

Giovanna Kobus Conrado, Andreas Pavlogiannis

AI总结 本文研究单子二阶逻辑作为弱内存元理论,证明顺序一致性执行有界树宽而TSO无界,展示多种模型可MSO公理化,并引入读自鲁棒性概念,实现统一验证算法。

Comments Accepted at CONCUR 2026

详情
AI中文摘要

内存模型是并发程序执行的形式化规范,解释了编译器和架构优化引入的弱行为。其数量和复杂性的增加促使人们通过在适当的元理论中公理化模型来统一验证整个模型类别。本文正式研究单子二阶逻辑(MSO)作为弱内存的元理论,通过证明各种流行弱内存模型的树宽和MSO可表达性结果,使得我们能够统一处理多个验证问题。总结如下:首先,我们证明顺序一致性($\mathsf{SC}$)下的执行具有有界树宽,而总存储顺序($\mathsf{TSO}$)下的执行则无界。其次,我们证明包括Release/Acquire和完整RC20在内的广泛模型是MSO可公理化的,而其他模型如Strong Release/Acquire和$\mathsf{TSO}$则不可,除非正交向量问题(在SETH下需要二次时间)可以在线性时间内解决。最后,我们引入读自鲁棒性概念,作为对近期粗粒度鲁棒性准则工作的扩展。我们证明树宽界限(上界和下界)对任何MSO可公理化模型$\mathsf{MM}$具有深远的算法意义:存在一个算法,对于每个程序$\mathsf{P}$,要么验证$\mathsf{P}$在$\mathsf{MM}$下的正确性,要么报告$\mathsf{P}$对$\mathsf{MM}$不是读自鲁棒的。总体而言,我们的结果为弱内存验证和鲁棒性建立了一个丰富且多功能的理论框架。

英文摘要

Memory models are formal specifications of concurrent-program executions, accounting for weak behaviors introduced by compiler and architectural optimizations. The increase of their number and complexity has spawned efforts for uniform verification across whole classes of models, by axiomatizing the models in an adequate metatheory that admits a uniform treatment. In this work, we formally study Monadic Second-Order logic (MSO) as a metatheory for weak memory, by proving results on the treewidth and MSO-expressibility of various popular weak-memory models, as this combination allows us to uniformly tackle several verification problems. In summary, our results are as follows. First, we prove that executions under Sequential Consistency ($\mathsf{SC}$) have bounded treewidth, while already those under Total Store Order ($\mathsf{TSO}$) do not. Second, we prove that a broad range of models, including Release/Acquire and the full RC20, are MSO-axiomatizable, while others, such as Strong Release/Acquire and $\mathsf{TSO}$, are not, unless the Orthogonal Vectors problem $\unicode{x2013}$ which requires quadratic time under SETH $\unicode{x2013}$ can be solved in linear time. Finally, we introduce the notion of reads-from robustness, as an extension to recent work on coarse robustness criteria. We show that our treewidth bounds (both upper and lower) have far-reaching algorithmic implications for any of our MSO-axiomatizable models $\mathsf{MM}$: there is an algorithm that, for every program $\mathsf{P}$, either verifies $\mathsf{P}$ under $\mathsf{MM}$ or reports that $\mathsf{P}$ is not reads-from robust against $\mathsf{MM}$. Overall, our results establish a rich and versatile theoretical framework for weak-memory verification and robustness.

2606.20121 2026-06-19 cs.LO 新提交

BARReL: a modern backend for Atelier B in Lean

BARReL:Atelier B 在 Lean 中的现代后端

Ghilain Bergeron, Vincent Trélat

AI总结 BARReL 是一个 Lean 4 库,桥接工业 B 方法工具 Atelier B 与 Lean 证明助手,支持在 Lean 中交互式进行 B 开发,通过显式良定义条件编码部分算子,并利用依赖类型保证良定义性,同时提供基本自动化。

详情
AI中文摘要

BARReL 是一个 Lean 4 库,桥接了工业 B 方法工具 Atelier B 与 Lean 证明助手,使用户能够在 Lean 中交互式地进行形式化 B 开发(直至机器精化和实现),同时保留标准 B 语法。B 部分算子通过生成显式的良定义条件进行仔细编码,利用 Lean 的依赖类型从构造上强制实施良定义性纪律。也就是说,证明义务和证明步骤不能静默地依赖于类型错误或定义不当的实例化。BARReL 还具备基本自动化功能,尝试自动处理此类良定义条件。该实现完全使用 Lean 元编程编写,并设计为模块化:扩展支持的 B 片段通常只需添加新的语法和编码子句。我们通过一个小型但具有代表性的案例研究说明了该方法,并论证 BARReL 可以作为迈向基于 Lean 证明助手的高度可靠的 Atelier B 工具链的垫脚石。

英文摘要

BARReL is a Lean 4 library bridging Atelier B, an industrial tool for the B method, and the Lean proof assistant by enabling users to conduct their formal B developments -- up to machine refinement and implementation -- interactively inside Lean, while retaining standard B syntax. B partial operators are carefully encoded by generating explicit well-definedness conditions, leveraging Lean's dependent types to enforce a well-definedness discipline by construction. That is, proof obligations and proof steps cannot silently rely on ill-typed or ill-defined instantiations. BARReL also features basic automation to try to discharge such well-definedness conditions automatically. The implementation is written entirely using Lean meta-programming and is designed to be modular: extending the supported B fragment typically requires only adding new syntax and encoding clauses. We illustrate the approach on a small but representative case study, and argue that BARReL can act as a stepping stone towards a strongly reliable Atelier B toolchain grounded in the Lean proof assistant.

2606.19936 2026-06-19 cs.LO cs.MM 新提交

Prismriver: Formalization of Music Theory and Algorithmic Composition in Lean 4

Prismriver:Lean 4 中音乐理论与算法作曲的形式化

Leni Aniva, Claire Wang

AI总结 使用 Lean 4 形式化音乐理论,实现可验证的算法作曲与伴奏生成,并支持音乐结构的单子分析。

详情
AI中文摘要

音乐理论遵循丰富的数学规则和对称性。这些对称性遵循数学结构,可以在证明助手的精确语言中进行验证和表达。在本文中,我们提出了 Prismriver,一个在 Lean 4 中形式化的音乐理论。通过在 Lean 4 中形式化音乐理论,我们为可验证的算法作曲和伴奏生成打开了大门。我们还实现了对音乐结构中单子分析的支持。

英文摘要

Music theory obeys a rich set of mathematical rules and symmetries. These symmetries follow mathematical structure which can be verified and expressioned in the precise language of a proof assistant. In this paper, we present Prismriver, a formalization of music theory in Lean 4. By formalizing music theory in Lean 4, we open the door to verifiable algorithmic composition and accompaniment generation. We also enable the analysis of monadic analysis of structures in music.

2606.19532 2026-06-19 cs.LO 新提交

Vancomycert: A Certified Neuro-Symbolic Drug Delivery System (Case Study)

Vancomycert: 一种经过认证的神经符号药物递送系统(案例研究)

Alistair Sirman, Fleur Conway, Jessica Ciupa, Gusts Gustavs Grīnbergs, Ekaterina Komendantskaya, Thai Son Hoang, Michael Rawson, Alessandro Bruni, Vaishak Belle, Michael John Williams

AI总结 针对抗生素给药神经网络控制器的形式化验证问题,提出一种结合监督学习和定理证明的方法,确保无限时域内自动给药不超过治疗上限。

详情
AI中文摘要

自主决策的神经网络控制器在网络物理系统中已得到广泛应用,但在安全关键的医疗环境中,其部署仍未得到充分验证。本文提出了一种用于抗生素给药神经网络控制器形式化验证的方法和案例研究,其动机源于系统必须在无限时间范围内同时具备适应性和可证明安全性的挑战。我们构建了一个简化但临床可解释的模型,用于跟踪药物浓度、体温和白细胞计数。万古霉素被选为代表性抗生素,广泛用于严重感染,但治疗窗口狭窄,超治疗浓度有肾毒性风险,而亚治疗剂量可能导致治疗失败。我们使用合成的临床医生式给药数据训练了一个监督式神经网络控制器。我们建立了输入-输出安全属性的形式化验证,特别验证了神经网络的一个属性,该属性意味着无限时域证明自动给药从未超过超治疗边界。该系统的属性在Rocq中使用Vehicle交互式定理证明器后端进行证明,以集成不同的证明系统。最终结果是一个验证流水线,允许各种治疗方法,同时为每个特定患者保持安全性。

英文摘要

Neural network controllers for autonomous decision-making are well-established in cyber-physical systems, yet their deployment in safety-critical healthcare settings remains largely unverified. This paper presents a methodology and case study for the formal verification of a neural network controller for antibiotic dosing, motivated by the challenge of systems that must be simultaneously adaptive and provably safe across unbounded time horizons. We construct a simplified yet clinically-interpretable model that tracks drug concentration, body temperature, and white blood cell count. Vancomycin is selected as a representative antibiotic, widely prescribed for severe infections yet carrying a narrow therapeutic window, where supratherapeutic concentrations risk nephrotoxicity and subtherapeutic dosing risks treatment failure. A supervised neural network controller is trained on synthetic clinician-style dosing data. We establish formal verification of input-output safety properties, specifically verifying a property of a neural network that implies an infinite-horizon proof that automated dosing never exceeds the supratherapeutic boundary. This system property is proven in Rocq using the Vehicle interactive theorem prover back-end to integrate the different proof systems. The end result is a verification pipeline that allows for a wide variety of treatment approaches whilst maintaining safety for each specific patient.

2606.19761 2026-06-19 cs.LO math.LO 新提交

Finishing Oltean's Completeness Proof in Lean 4 for Hybrid Logic $L(\forall)$

在 Lean 4 中完成 Oltean 关于混合逻辑 $L(\forall)$ 的完备性证明

Lars Warren Ericson

AI总结 本文在 Lean 4 中完成了混合逻辑 $L(\forall)$ 的机器检查完备性证明,通过结构新鲜性和存在引理 Henkin 构造两种工具解决了新鲜名称的生成问题。

Comments 147 pages, 5 figures

详情
AI中文摘要

我们给出了一个在 Lean 4 中机器检查的完备性定理,针对混合逻辑 $L(\forall)$:带有名义词、满足风格绑定器 $\forall$ 和盒子模态的命题模态逻辑。(基本混合逻辑(无绑定器)的机器检查完备性由 Asta Halkjær From 在 Isabelle/HOL 中开创。)我们基于 Alex Oltean 2023 年的 Lean 4 形式化工作,该工作机械化了语法、语义、希尔伯特风格证明系统和可靠性(遵循 Blackburn 的混合完备性(1998)),但留下了不完备的部分。完成它需要在两个结构不同的点上制造新鲜名称,我们的核心发现是它们需要两种不同的工具。(1)通过扩展的 Lindenbaum 构造构建的根可证最大一致集,每一步都需要一个对整个集合新鲜的名义词;正确的工具是结构新鲜性:扩展语言,使得通过构造保留无限的名义词供应。我们调查了设计空间(Oltean 在 $\mathbb{N}$ 内的奇偶编码、Bud Mishra 建议的不交和 $N \oplus \mathbb{N}$ 参数化,以及 From 的合成完备性框架)并解释了我们采用的编码。(2)一个最大一致集的可证 $\Diamond$-后继不能通过这种方式获得:其典范盒子归约可证地提及每个名义词,因此没有保留的名称是新鲜的。这里正确的工具是 Oltean 选择但未完成的:一个存在引理 Henkin 构造,通过一个新鲜状态变量从前驱的可证性中抽取每个见证;我们通过一个携带数据的见证累加器和一个紧致性论证完成了它。定理 $\Gamma \models \varphi \to \Gamma \vdash \varphi$ 被完全形式化:该开发是无 sorry 的,且 #print axioms 仅报告 propext、this http URL 和 this http URL。我们将开发移植到 Lean v4.30.0 / mathlib v4.30.0。

英文摘要

We present a machine-checked completeness theorem, in Lean 4, for the hybrid logic $L(\forall)$: propositional modal logic with nominals, the satisfaction-style binder $\forall$, and the box modality. (Machine-checked completeness for basic hybrid logic, without binders, was pioneered by Asta Halkjær From in Isabelle/HOL.) We build on Alex Oltean's 2023 Lean 4 formalization, which mechanized the syntax, semantics, Hilbert-style proof system, and soundness following Blackburn's Hybrid Completeness (1998), but left completeness unfinished. Finishing it requires manufacturing fresh names at two structurally different points, and our central finding is that they call for two different tools. (1) The root witnessed maximal consistent set, built by an extended Lindenbaum construction, needs at each step a nominal fresh for the whole set; the right tool is structural freshness: extend the language so an infinite supply of nominals is reserved by construction. We survey the design space (Oltean's odd/even encoding inside $\mathbb{N}$, the disjoint-sum $N \oplus \mathbb{N}$ parameterization suggested by Bud Mishra, and From's synthetic-completeness frameworks) and explain the encoding we adopt. (2) The witnessed $\Diamond$-successor of a maximal consistent set cannot be obtained this way: its canonical box-reduct provably mentions every nominal, so no reserved name is fresh. Here the right tool is one Oltean chose but left incomplete: an existence-lemma Henkin construction drawing each witness from the predecessor's witnessedness through a fresh state variable; we complete it with a data-carrying witness accumulator and a compactness argument. The theorem $Γ\models φ\to Γ\vdash φ$ is fully formalized: the development is sorry-free, and #print axioms reports only propext, Classical.choice, and Quot.sound. We port the development to Lean v4.30.0 / mathlib v4.30.0.

2606.20492 2026-06-19 cs.CR cs.LO 交叉投稿

A-COMPASS: Formal Foundations for Anonymity Analysis in Microdata

A-COMPASS:微观数据匿名性分析的形式化基础

Tamara Tagliavia, Silvia Ghilezan

AI总结 本文修改COMPASS语言为A-COMPASS,使其适用于微观数据表,支持匿名条件检查与匿名化操作,并证明其语义的确定性和组合性,可用于验证k-匿名和l-多样性等属性。

详情
AI中文摘要

在信息时代,主要问题之一是如何确保个人隐私。根据考虑隐私的背景,出现了各种数据隐私模型。然而,即使对于最基本的模型,这些模型的形式化验证领域仍未得到充分探索。验证隐私需求的一种尝试是合规断言语言(COMPASS)。在COMPASS中,可以指定表需要满足的匿名条件,以及条件不满足时将修改表的操作。它设计用于对预处理后的表进行操作,形式为一条记录对应一组人。在本文中,我们修改COMPASS语言,使其以通常的一条记录对应一个人的形式对微观数据表进行操作。修改后的语言称为A-COMPASS。除了检查先前应用的匿名条件外,A-COMPASS还作为新功能支持执行匿名化操作。我们进一步提供了A-COMPASS语言的语法和语义。我们还证明了引入的语义的最重要属性,如确定性和组合性。最后,我们提供了一种验证匿名属性(如k-匿名和l-多样性)的机制。

英文摘要

In the information age, one of the leading problems is how to ensure individual's privacy. Depending on the context in which privacy is considered, various data privacy models have emerged. However, the domain of formal verification of these models is still not sufficiently explored even when it comes to the most basic models. An attempt to verify privacy requirements is the Compliance Assertion Language (COMPASS). In COMPASS, one can specify an anonymity condition that a table needs to satisfy, and an action that will modify the table if the condition is not satisfied. It is designed to operate on preprocessed tables in a form one record - one group of people. In this paper, we modify the COMPASS language in order to operate on microdata tables in their usual form of one record - one person. The modified language is called A-COMPASS. Along with checking of previously applied anonymity conditions, A-COMPASS enables the execution of anonymization actions as a new feature. We further provide the syntax and the semantics for the A-COMPASS language. We also prove the most important properties of the introduced semantics like determinism and compositionality. Finally, we provide a mechanism to verify anonymity properties, such as k-anonymity and l-diversity.

2606.20292 2026-06-19 cs.LG cs.LO 交叉投稿

Shifting-based Optimizable Linear Relaxations for General Activation Functions

基于平移的可优化线性松弛用于通用激活函数

Philipp Kern, László Antal, Erika Ábráham, Carsten Sinz

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) Karlsruhe University of Applied Sciences(卡尔斯鲁厄应用科学大学)

AI总结 提出SLiR方法,通过斜率参数化和平移过程生成任意激活函数的线性松弛,在保持正确性的同时实现高效优化,验证属性数量比现有方法多7.8倍。

Comments 21 pages, under review

详情
AI中文摘要

神经网络(NN)的使用正在迅速增加,包括在安全关键领域。为了提供关于NN行为的正式保证,许多验证方法依赖于激活函数的可优化线性松弛。然而,现有技术依赖于为每个激活函数手工制作的松弛。因此,扩展到最先进的激活函数需要大量手动工作。相比之下,我们的方法SLiR(基于平移的线性松弛)具有广泛的适用性,仅需要Lipschitz常数或一组临界点。SLiR通过斜率参数化松弛,并通过平移过程计算相应的偏移,确保在输入域上的可靠上下界,从而在保持正确性的同时实现高效优化。我们的实验表明,SLiR在广泛的实际激活函数上产生紧致的松弛,并且与最先进的方法相比,能够验证多达7.8倍更多的属性。

英文摘要

The use of neural networks (NNs) is rapidly increasing, including in safety- and security-critical domains. To provide formal guarantees about NN behavior, many verification methods rely on optimizable linear relaxations of activation functions. However, existing techniques depend on hand-crafted relaxations for each activation function. Extension to state-of-the-art activation functions therefore requires substantial manual effort. In contrast, our approach SLiR (Shifting-based Linear Relaxations) is broadly applicable, requiring only a Lipschitz constant or a set of critical points. SLiR parameterizes relaxations by their slope and computes the corresponding offset via a shifting procedure that ensures sound upper and lower bounds over the input domain, enabling efficient optimization while maintaining correctness. Our experiments show that SLiR produces tight relaxations across a wide range of practical activation functions and enables verification of up to 7.8x more properties compared to state-of-the-art methods.

2606.19632 2026-06-19 cs.RO cs.AI cs.LG cs.LO cs.MA 交叉投稿

Formal Verification of Learned Multi-Agent Communication Policies via Decision Tree Distillation

通过决策树蒸馏对学习到的多智能体通信策略进行形式化验证

Ahmad Farooq, Kamran Iqbal

发表机构 * University of Arkansas at Little Rock(阿肯色大学小石城分校)

AI总结 提出通过决策树蒸馏将多智能体强化学习策略转化为可解释模型,并利用PRISM进行形式化验证,确保安全属性转移至原始网络,在无人机编队任务中实现88.9%属性满足率。

Comments 9 pages, 3 figures, 7 tables. Accepted at the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026), Pittsburgh, Pennsylvania, USA, September 27-October 1, 2026

详情
AI中文摘要

多智能体强化学习使智能体能够通过涌现通信发展协调策略,但神经策略缺乏无人机群和自动驾驶车队等安全关键机器人部署所需的形式化安全保证。我们提出了首个通过学习策略抽象进行安全验证的端到端框架:神经策略被蒸馏为可解释的决策树,然后进行形式化验证,并通过经验验证确认验证的安全属性可转移至原始网络。我们的四阶段流程包括:从智能体观测中提取领域特定特征;决策树蒸馏达到97.9% +/- 1.2%的神经策略保真度;自动翻译为PRISM概率模型检查器规范,具有完整的特征到状态变量对应关系;以及通过成对分解、联合界聚合和经验邻居建模对概率计算树逻辑属性进行组合验证。评估用于5-7个智能体多无人机协调的矢量量化变分信息瓶颈策略,我们验证了18个涵盖安全性、活性和合作的时间逻辑属性,实现了88.9%的属性满足率,所有五个安全阈值均满足(碰撞概率0.3% vs 阈值1%)。原始神经策略的蒙特卡洛验证确认验证的安全属性转移偏差<=0.6个百分点(95%置信区间)。离散VQ-VIB消息相比连续方法提供+11.6至+13.6个百分点的保真度优势,实现3-4倍更快的验证。我们的框架为蒸馏策略抽象提供了经验验证的安全验证,作为深度多智能体强化学习与多机器人部署形式化安全工作流之间的实用桥梁。

英文摘要

Multi-agent reinforcement learning (MARL) enables agents to develop coordination strategies through emergent communication, but neural policies lack the formal safety guarantees required for safety-critical robotic deployment in drone swarms and autonomous vehicle fleets. We present the first end-to-end framework for safety verification of learned multi-agent communication policies through policy abstraction: neural policies are distilled into interpretable decision trees, then formally verified, with empirical validation confirming that verified safety properties transfer to original networks. Our four-stage pipeline consists of domain-specific feature extraction from agent observations, decision tree distillation achieving 97.9% +/- 1.2% fidelity to neural policies, automated translation to PRISM probabilistic model checker specifications with complete feature-to-state-variable correspondence, and compositional verification of Probabilistic Computation Tree Logic (PCTL) properties via pairwise decomposition with union-bound aggregation and empirical neighbor modeling. Evaluating Vector-Quantized Variational Information Bottleneck (VQ-VIB) policies for multi-drone coordination with 5-7 agents, we verify 18 temporal logic properties across safety, liveness, and cooperation, achieving 88.9% property satisfaction with all five safety thresholds satisfied (0.3% collision probability vs. 1% threshold). Monte Carlo validation of original neural policies confirms that verified safety properties transfer with <=0.6 percentage-point deviation (95% CI). Discrete VQ-VIB messages provide +11.6 to +13.6 percentage-point fidelity advantages over continuous methods, enabling 3-4x faster verification. Our framework provides empirically validated safety verification for distilled policy abstractions, serving as a practical bridge between deep MARL and formal safety workflows for multi-robot deployment.

2606.19588 2026-06-19 cs.AI cs.CR cs.LO 交叉投稿

Analyzing the Narration Gap in LLM-Solver Loops

分析大语言模型-求解器循环中的叙述差距

Zunchen Huang, Songgaojun Deng

发表机构 * Eindhoven University of Technology(埃因霍温理工大学)

AI总结 研究LLM与SAT/SMT求解器混合推理中,将求解器输出转化为用户答案的叙述步骤存在的安全漏洞,通过形式化建模和实验评估发现证书门控可保证求解结果正确,但对抗攻击可反转结论。

详情
AI中文摘要

诸如SAT和SMT求解器之类的形式化工具,当安全或安保关键问题可以用逻辑表述时,越来越多地被嵌入到语言模型推理流程中。与思维链不同(其步骤从模型分布中采样,没有形式化保证),求解器产生可靠且可独立验证的答案。然而,这种可靠性保证可能在求解器与模型之间的交互中丢失。混合流程包含三个组成部分:形式化问题、求解问题以及叙述结果。先前的工作研究了形式化和求解,但未涉及叙述——即将形式化工具的输出转化为用户答案的步骤。为了填补叙述差距,我们首先将LLM-求解器循环建模为经过验证的决策过程。我们进一步在提示注入下评估了五个开源模型,发现证书门控使求解器判定可靠,而攻击者可以通过不同措辞和渠道反转已验证的结论。我们研究了通过强化提示进行缓解的方法,该方法显著减少了注入但无法完全消除,并且在自适应攻击下仍然存在问题。结合形式化分析和实证研究,我们表明在LLM-求解器循环中,鲁棒性无法延伸到用户最终读取的答案。

英文摘要

Formal tools such as SAT and SMT solvers are increasingly embedded in language model reasoning pipelines when a safety or security critical question can be formulated in logic. Unlike chain of thought whose steps are sampled from the model distribution without formal guarantee, a solver produces a sound and independently verifiable answer. However, the soundness guarantee can be lost in the interaction between the solver and the model. The hybrid pipeline has three components: formalizing the question, deciding it, and narrating the result. Prior work has studied the formalization and decision, but not narration, which is the step that turns a formal tool's output into the user answer. To fill the narration gap, we first model the LLM-solver loop as a verified decision procedure. We further evaluate five open-sourced models under prompt injection, and we find certificate gating makes the solver verdict sound, while an adversary can invert a verified conclusion across phrasings and channels. We study the mitigation through hardened prompt that reduces injection significantly but cannot eliminate it and still suffers under adaptive attack. Combining the formal analysis and empirical studies, we show in the LLM-solver loop, robustness does not reach to the answer that the user finally reads.

2606.19399 2026-06-19 cs.LG cs.AI cs.LO cs.PL 交叉投稿

VERITAS: Verifier-Guided Proof Search for Zero-Shot Formal Theorem Proving

VERITAS:验证器引导的零样本形式定理证明搜索

Manish Acharya, Zhenyu Liao, Yueke Zhang, Kevin Leach, Yu Huang, Yifan Zhang

发表机构 * Department of Computer Science, Vanderbilt University(范德堡大学计算机科学系) Amazon(亚马逊)

AI总结 提出VERITAS框架,通过两阶段协议(Best-of-N采样+批评引导MCTS)利用验证器反馈进行零样本定理证明,在miniF2F上达40.6%准确率,并发布组合学基准VERITAS-CombiBench。

详情
AI中文摘要

基于LLM的形式化证明器通常将丰富的验证器信号(语法错误、类型不匹配、部分目标进展)压缩为二进制的通过/失败位。我们提出VERITAS,一个零样本框架,通过两阶段协议将每个验证器信号路由回证明搜索:首先进行Best-of-N采样,然后进行批评引导的MCTS遍历,该遍历将第一阶段失败作为显式负例吸收。该协议保留其第一阶段扫描解决的每个定理,因此第二阶段额外的解决可归因于反馈驱动的探索。VERITAS在miniF2F上达到40.6%(相比之下,独立运行的Best-of-5为36.9%,Portfolio为26.2%),在VERITAS-CombiBench上达到7.3%,这是一个我们发布的55个定理的组合学基准,在该基准上Best-of-5(1.8%)低于Portfolio(3.6%),暴露了当必须从验证器反馈中迭代恢复正确的引理名称时,无指导的采样会带来损害。工件可在GitHub上获取。

英文摘要

LLM-based formal provers often collapse rich verifier signals (syntax errors, type mismatches, partial goal progress) into a binary pass/fail bit. We present VERITAS, a zero-shot framework that routes every verifier signal back into proof search through a two-phase protocol: Best-of-N sampling first, then a critic-guided MCTS pass that ingests Phase 1 failures as explicit negative examples. The protocol preserves every theorem solved by its own Phase 1 sweep, so Phase 2's additional solves are attributable to feedback-driven exploration. VERITAS reaches 40.6% on miniF2F (vs. an independently run Best-of-5 at 36.9%, Portfolio 26.2%) and 7.3% on VERITAS-CombiBench, a 55-theorem combinatorics benchmark we release on which Best-of-5 (1.8%) falls below Portfolio (3.6%), exposing that unguided sampling hurts when correct lemma names must be recovered iteratively from verifier feedback. Artifacts are available on GitHub.

2606.19492 2026-06-19 math.LO cs.LO math.RA 交叉投稿

Functional completeness and primitive positive decomposition of relations on finite domains

有限域上关系的功能完备性与原始正分解

Sergiy Koshkin

AI总结 提出一种新的初等方法,将高元关系原始正分解为二元关系,利用多值逻辑中2输入函数的功能完备性,将关系解释为部分定义的多值函数图,并通过函数分解有效实现。

Comments 19 pages, no figures

Journal ref Logic Journal of the IGPL, Volume 33, Issue 2, April 2025, jzae077

详情
AI中文摘要

我们给出了一种新的初等方法,将有限域上的高元关系原始正分解为二元关系。这种分解在约束满足问题、克隆理论和关系数据库的应用中出现。该构造利用多值逻辑中2输入函数的功能完备性,将关系解释为部分定义的多值'函数'的图。然后,这些'函数'由通常意义上的普通函数复合而成。该构造在计算上是有效的,并依赖于成熟的函数分解方法,但仅将关系约简为三元关系。另一个构造随后将三元关系分解为二元关系,也是有效的,通过将某些析取转换为存在量化。结果给出了有限域上皮尔斯约简论点的统一证明,并表明任何Sheffer函数的图都能复合出所有关系。

英文摘要

We give a new and elementary construction of primitive positive decomposition of higher arity relations into binary relations on finite domains. Such decompositions come up in applications to constraint satisfaction problems, clone theory and relational databases. The construction exploits functional completeness of 2-input functions in many-valued logic by interpreting relations as graphs of partially defined multivalued 'functions'. The 'functions' are then composed from ordinary functions in the usual sense. The construction is computationally effective and relies on well-developed methods of functional decomposition, but reduces relations only to ternary relations. An additional construction then decomposes ternary into binary relations, also effectively, by converting certain disjunctions into existential quantifications. The result gives a uniform proof of Peirce's reduction thesis on finite domains, and shows that the graph of any Sheffer function composes all relations there.

2605.03064 2026-06-19 cs.LO 版本更新

Neural networks as fuzzy logic formulas

神经网络作为模糊逻辑公式

Damian Heiman, Antti Kuusisto, Esko Turunen

AI总结 本文通过Rational Pavelka逻辑及其扩展,为有理权重ReLU激活的神经网络提供了模糊逻辑刻画,并推广到允许任意实数值激活的广义多项式环。

详情
AI中文摘要

神经网络是现代人工智能的一个基本方面,在包括Transformer和图神经网络在内的各种重要机器学习架构中扮演着关键角色。最近,逻辑刻画已被用于研究许多机器学习架构的表达能力,但普通神经网络的逻辑刻画受到的关注较少。在本文中,我们通过Rational Pavelka逻辑($\mathrm{RPL}$)及其扩展$\mathrm{RPL}(\odot)_{\leq 1}$,以及$\mathit{L \Pi} \frac{1}{2}$的两个片段$\mathit{L \Pi} \frac{1}{2}(\rightarrow_{P}^-)_{\leq 1}$和$\mathit{L \Pi} \frac{1}{2}(\odot^-, \rightarrow_{P}^-)$,为有理权重ReLU激活的神经网络提供了模糊逻辑刻画。神经网络的激活值允许为任意实数。我们还通过模糊逻辑$\mathrm{RPL}(\odot)$和$\mathit{L \Pi} \frac{1}{2}$的一个片段$\mathit{L \Pi} \frac{1}{2}(\rightarrow_{P}^-)$,为可数多个变量上允许使用ReLU函数的广义多项式环$\mathbb{Q}$提供了模糊逻辑刻画。

英文摘要

Neural networks are a fundamental aspect of modern artificial intelligence, playing a key role in various important machine learning architectures including transformers and graph neural networks. Recently, logical characterisations have been used to study the expressive power of many machine learning architectures, but logical characterisations of plain neural networks have received less attention. In this paper, we provide fuzzy logic characterisations of rational-weight ReLU-activated neural networks via Rational Pavelka logic ($\mathrm{RPL}$) and an extension of $\mathrm{RPL}$ called $\mathrm{RPL}(\odot)_{\leq 1}$, as well as two fragments of $\mathit{L Π} \frac{1}{2}$ called $\mathit{L Π} \frac{1}{2}(\rightarrow_{P}^-)_{\leq 1}$ and $\mathit{L Π} \frac{1}{2}(\odot^-, \rightarrow_{P}^-)$. The activation values of the neural networks are allowed to be arbitrary real numbers. We also provide fuzzy logic characterisations of a generalised polynomial ring over $\mathbb{Q}$ in countably many variables where the use of the ReLU-function is permitted via the fuzzy logic $\mathrm{RPL}(\odot)$ and a fragment of $\mathit{L Π} \frac{1}{2}$ called $\mathit{L Π} \frac{1}{2}(\rightarrow_{P}^-)$.

2605.20531 2026-06-19 cs.LO cs.LG 版本更新

Pseudo-Formalization for Automatic Proof Verification

伪形式化用于自动证明验证

Slim Barkallah, Luke Bailey, Kaiyue Wen, Mohammed Abouzaid, Tengyu Ma

发表机构 * GitHub

AI总结 本文提出了一种名为伪形式化的证明格式,该格式在保持自然语言灵活性的同时,保留了形式证明的模块性和精确性,通过块验证算法实现了对自然语言证明的高效验证,其在错误发现的精度和召回率上优于现有基线方法。

Comments 31 pages, code available at https://github.com/Slim205/pseudo-formalization

详情
AI中文摘要

可靠的证明验证仍然是训练和评估在复杂数学推理上的人工智能系统的主要瓶颈。在像Lean这样的语言中,完全形式化的证明容易验证,因为它们是无歧义且模块化的。大多数证明,尤其是由人工智能系统编写证明,既没有这种属性,将它们翻译成形式语言在许多前沿数学领域仍然具有挑战性。我们提出了伪形式化(PF),一种证明格式,它捕捉了形式证明的模块性和精确性,同时保留了自然语言的灵活性。一个伪形式化证明被分解成自包含的模块,每个模块陈述其前提、结论和证明,用自然语言。为了验证一个常规的自然语言证明的正确性,一个LLM将其翻译成伪形式化,然后独立验证每个模块,我们称之为块验证(BV)。我们在两个涵盖竞赛和研究级数学的基准上评估PF+BV,其中它在错误发现的精度和召回率上优于LLM-as-judge基线。为了支持未来的工作,我们发布了我们的研究级证明验证基准ArxivMathGradingBench。

英文摘要

Reliable verification of proofs remains a bottleneck for training and evaluating AI systems on hard mathematical reasoning. Fully formal proofs, in languages like Lean, are easy to verify because they are unambiguous and modular. Most proofs, particularly those written by AI systems, have neither property, and translating them into formal languages remains challenging in many frontier math settings. We propose Pseudo-Formalization (PF), a proof format that captures the modularity and precision of formal proofs while retaining the flexibility of natural language. A Pseudo-Formal proof is decomposed into self-contained modules, each stating its premises, conclusion, and proof in natural language. To verify the correctness of a regular natural language proof, an LLM translates it to Pseudo-Formal and then verifies each module independently, an algorithm we call Block Verification (BV). We evaluate PF+BV on two benchmarks spanning olympiad and research-level mathematics, where it pareto-dominates LLM-as-judge baselines on error-finding precision and recall. To support future work, we release our research-level proof verification benchmark ArxivMathGradingBench.

2505.05306 2026-06-19 cs.LO 版本更新

The calculus of neo-Peircean relations

新皮尔士关系演算

Filippo Bonchi, Alessandro Di Giorgio, Nathan Haydon, Pawel Sobocinski

AI总结 通过从笛卡尔语法转向单子图语法,提出新皮尔士关系演算,其表达能力与一阶逻辑相当,从而规避了关系演算不可有限公理化的经典结论。

Comments arXiv admin note: substantial text overlap with arXiv:2401.07055

详情
AI中文摘要

关系演算由德摩根和皮尔士在19世纪下半叶引入,作为布尔类代数的扩展。后来弗雷格和皮尔士本人对量化理论的发展,为今天所知的一阶逻辑铺平了道路,导致关系演算被长期遗忘。直到1941年,塔斯基提出了关于其是否存在完全公理化的问题。这个问题只得到了否定的答案:关系演算及其许多片段没有有限公理化,后来由若干不可能定理证明。在本文中,我们表明——通过从传统语法(笛卡尔)转向图解语法(单子)——可以为整个演算提供完全公理化。不可能定理被规避,因为我们的演算,称为新皮尔士关系演算,比关系演算更具表达力,实际上与一阶逻辑一样具有表达力。公理是通过结合两个著名的范畴结构:笛卡尔双范畴和线性双范畴而获得的。

英文摘要

The calculus of relations was introduced by De Morgan and Peirce during the second half of the 19th century, as an extension of Boole's algebra of classes. Later developments on quantification theory by Frege and Peirce himself, paved the way to what is known today as first-order logic, causing the calculus of relations to be long forgotten. This was until 1941, when Tarski raised the question on the existence of a complete axiomatisation for it. This question found only negative answers: there is no finite axiomatisation for the calculus of relations and many of its fragments, as shown later by several no-go theorems. In this paper we show that -- by moving from traditional syntax (cartesian) to a diagrammatic one (monoidal) -- it is possible to have complete axiomatisations for the full calculus. The no-go theorems are circumvented by the fact that our calculus, named the calculus of neo-Peircean relations, is more expressive than the calculus of relations and, actually, as expressive as first-order logic. The axioms are obtained by combining two well known categorical structures: cartesian and linear bicategories.