arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2042
2606.06391 2026-06-05 stat.ML cs.LG

Conformal Risk Sharing: Certified Cost Allocation with Participation Guarantees

共形风险分担:具有参与保证的认证成本分配

Ieva Kazlauskaite

发表机构 * Ieva Kazlauskaite(伊娃·卡祖利特)

AI总结 提出共形风险分担方法,通过可解释的分担策略与分裂共形校准相结合,从有限数据中无分布假设地分配罕见事件的财务影响,为每个参与者提供义务上限并验证无人受损。

详情
AI中文摘要

将罕见不利事件的财务影响在群体中分担可以减轻极端个人负担,但任何因该安排而变得更糟的参与者都有理由退出。因此,一个可信的机制必须为每个代理人提供其未来义务的可信上限,并且只有在参与者之间的总损害有界时才应部署。我们将此形式化为认证分配问题:从有限数据中,无需分布假设,找到一种再分配规则,为每个参与者产生义务上限,并验证没有参与者实质上变得更糟。我们提出共形风险分担,通过将可解释的分担策略与分裂共形校准相结合来解决这个问题。分担强度在训练数据上调整,而保留的校准数据产生无分布假设的每个代理保证(在可交换性下有效)。在合成和真实数据(包括降水和能源合作社数据)上的实验证实,该框架可以显著降低高风险代理的极端义务,同时控制对他人的损害。

英文摘要

Sharing the financial impact of rare adverse events across a group can soften extreme individual burdens, but any participant made worse off by the arrangement has reason to leave. A credible mechanism must therefore provide each agent with a trustworthy cap on their future obligation and should be deployed only if the aggregate harm across participants is bounded. We formalise this as the Certified Allocation Problem: from finite data and without distributional assumptions, find a redistribution rule, produce obligation caps for every participant, and verify that no participant is made materially worse off. We propose Conformal Risk Sharing, which solves this problem by pairing an interpretable sharing policy with split conformal calibration. The sharing intensity is tuned on training data, while held-out calibration data produces distribution-free per-agent guarantees (valid under exchangeability). Experiments on synthetic and real-world data, including precipitation and energy-cooperative data, confirm that the framework can substantially reduce extreme obligations for high-risk agents while controlling harm to others.

2606.06373 2026-06-05 eess.SP cs.AI

LatentWave: JEPA Pretraining for Wireless Foundation Models

LatentWave: 无线基础模型的JEPA预训练

Ahmed Mohamed, Ahmed Aboulfotouh, Hatem Abou-Zeid

发表机构 * University of California, San Diego(加州大学圣地亚哥分校)

AI总结 提出LatentWave,采用联合嵌入预测架构(JEPA)在潜空间预测掩码区域,学习可迁移的无线信号表示,并在四个下游任务中优于掩码建模基线。

详情
AI中文摘要

无线基础模型已成为为每个无线任务构建单独模型的有前途的替代方案。然而,现有方法依赖于掩码输入重建,这可能会使表示偏向于低级信号细节。在本文中,我们提出了LatentWave,一种无线基础模型,使用联合嵌入预测架构(JEPA)在多样化的无线频谱图和信道状态信息(CSI)上进行预训练。通过在潜空间中预测掩码区域,LatentWave学习到的表示在多种下游任务中具有更好的开箱即用迁移性。所提出的架构在预训练期间采用每通道补丁嵌入和随机通道采样,使其能够处理可变的天线数量,并提高在异构无线配置中的可用性。我们在四个下游任务上评估了LatentWave:射频信号分类、5G NR定位、波束预测和视距/非视距分类,并与在同一数据上预训练的掩码建模基线(WavesFM)进行比较。此外,我们表明掩码几何形状引入了任务相关的归纳偏差:频率掩码强烈有利于与信道相关的任务,如定位和波束预测,而区域掩码则更好地保留信号分类的可区分性。

英文摘要

Wireless foundation models have emerged as a promising alternative to building separate models for each wireless task. However, existing approaches rely on masked input reconstruction, which can bias representations toward low-level signal details. In this paper, we propose LatentWave, a wireless foundation model pretrained using a Joint-Embedding Predictive Architecture (JEPA) on diverse wireless spectrograms and channel state information (CSI). By predicting masked regions in latent space, LatentWave learns representations that are more transferable out of the box across diverse downstream tasks. The proposed architecture employs per-channel patch embeddings with stochastic channel sampling during pretraining, allowing it to process variable antenna counts and improving usability across heterogeneous wireless configurations. We evaluate LatentWave on four downstream tasks: RF signal classification, 5G NR positioning, beam prediction, and LoS/NLoS classification, comparing against a masked-modeling baseline (WavesFM) pretrained on the same data. Additionally, we show that the masking geometry introduces a task-dependent inductive bias: frequency masking strongly favors channel-related tasks such as positioning and beam prediction, while region masking better preserves discriminability for signal classification.

2606.06351 2026-06-05 stat.ML cs.LG

Function-Space Priors for Bayesian Neural ODEs with Application to Vessel Trajectory Prediction

贝叶斯神经常微分方程的函数空间先验及其在船舶轨迹预测中的应用

Jaeyeong Lee, Wonmo Koo, Heeyoung Kim

发表机构 * Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology (KAIST)(工业与系统工程系,韩国科学技术院(KAIST))

AI总结 针对船舶轨迹预测中不规则采样、缺失报告和复杂动力学挑战,提出一种在向量场上施加高斯过程核先验的正则化方法,并结合概率多重打靶实现长序列的不确定性量化。

详情
AI中文摘要

从自动识别系统(AIS)数据预测船舶轨迹对于海上态势感知至关重要,但由于不规则采样、缺失报告和复杂动力学,这仍然具有挑战性。除了准确的点预测外,海事应用还需要良好校准的不确定性估计以支持可靠决策。贝叶斯神经常微分方程(ODE)通过在神经向量场参数上放置先验,为具有不确定性量化的连续时间轨迹建模提供了原则性框架。然而,常用的各向同性高斯权重先验无法编码船舶动力学的信息性结构特性,如平滑性和局部性。现有的函数空间贝叶斯神经网络方法解决了静态映射的这一限制,但不能直接转移到神经常微分方程,因为其主要关注量是轨迹而非向量场本身。原则上,可以直接在ODE解上放置高斯过程(GP)先验,但这需要将分布通过非线性ODE求解器传播,这在分析上是棘手的。为了解决这一挑战,我们采用了一种实用方法,直接在有限测量点集上评估的向量场上施加基于GP核的先验。具体来说,我们用基于核的正则化器增强标准权重空间变分目标,该正则化器惩罚向量场偏离GP先验所隐含的结构。为了处理长且不规则的AIS轨迹,我们进一步将这种函数空间正则化与概率多重打靶相结合,该打靶方法在保持全局一致性的同时解耦跨时间段的推理。

英文摘要

Vessel trajectory prediction from Automatic Identification System (AIS) data is essential for maritime situational awareness, yet it remains challenging due to irregular sampling, missing reports, and complex dynamics. Beyond accurate point forecasts, maritime applications also demand well-calibrated uncertainty estimates for reliable decision-making. Bayesian Neural Ordinary Differential Equations (ODEs) offer a principled framework for continuous-time trajectory modeling with uncertainty quantification by placing a prior over the neural vector field parameters. However, the commonly used isotropic Gaussian weight prior fails to encode informative structural properties of vessel dynamics, such as smoothness and locality. Existing function-space Bayesian neural network methods address this limitation for static mappings, but do not transfer directly to Neural ODEs, where the primary quantity of interest is the trajectory rather than the vector field itself. In principle, one could place a Gaussian process (GP) prior directly over ODE solutions, but this requires propagating distributions through a nonlinear ODE solver, which is analytically intractable. To address this challenge, we adopt a practical approach that imposes a GP-kernel-based prior directly on the vector field evaluated at a finite set of measurement points. Specifically, we augment the standard weight-space variational objective with a kernel-based regularizer that penalizes deviations of the vector field from the structure implied by a GP prior. To handle long and irregular AIS trajectories, we further combine this function-space regularization with probabilistic multiple shooting, which decouples inference across temporal segments while maintaining global consistency.

2606.06347 2026-06-05 eess.SY cs.LG cs.SY

Attack Detection using Time Series Foundation Models

使用时间序列基础模型的攻击检测

Sribalaji C. Anand, Anh Tung Nguyen, George J. Pappas

发表机构 * University of Pennsylvania(宾夕法尼亚大学) KTH Royal Institute of Technology(皇家理工学院) Uppsala University(乌普萨拉大学)

AI总结 针对无模型知识的网络物理系统,提出基于TimesFM时间序列基础模型的零样本攻击检测方法,在IEEE 14节点电力系统上验证其性能。

Comments Under review

详情
AI中文摘要

本文解决了在没有任何被控对象模型或其结构知识的情况下,网络物理系统中的攻击检测问题。远程被控对象通过假设受到攻击的网络向操作员传输传感器测量值。我们考虑两类攻击:无模型重放攻击和基于模型的隐蔽攻击。对于后者,我们针对线性与非线性系统,推导了针对$\chi^2$检测器的最优隐蔽攻击策略的闭式表达式。然后,我们提出一种基于TimesFM(Google Research开发的时间序列基础模型)的无模型结构检测器,该检测器以零样本方式作为替代残差生成器运行。实验表明,基于TimesFM的检测器实现了相当或更优的攻击检测性能。在IEEE 14节点电力系统上通过数值实验证明了所提方法的有效性。我们还证明,当经典冗余假设失效时,TimesFM预测可作为受损测量值的替代,这是一种实用的缓解技术。

英文摘要

This paper addresses the problem of attack detection in cyber-physical systems without any knowledge of the plant model or its structure. A remotely located plant transmits sensor measurements to an operator over a network that is assumed to be under attack. We consider two classes of attacks: model-free replay attacks and model-based stealthy attacks. For the latter, we derive closed-form expressions for the optimal stealthy attack policy against a $χ^2$ detector, for both linear and nonlinear systems. We then propose a model-structure-free detector based on TimesFM, a time-series foundation model developed by Google Research, which serves as a surrogate residual generator operating in a zero-shot fashion. We show empirically that the TimesFM-based detector achieves a comparable or superior attack detection performance. The efficacy of the proposed approach is demonstrated numerically on the IEEE 14-bus power system. We also demonstrate that TimesFM predictions can serve as a substitute for corrupted measurements, a practical mitigation technique when classical redundancy assumptions fail.

2606.06342 2026-06-05 stat.ML cs.LG

Symmetric Divergence and Normalized Similarity: A Unified Topological Framework for Representation Analysis

对称散度与归一化相似性:表示分析的统一拓扑框架

Yan Wang, Tianyang Hu

发表机构 * School of Data Science, The Chinese University of Hong Kong, Shenzhen(数据科学学院,香港中文大学(深圳))

AI总结 提出对称表示拓扑散度(SRTD)和归一化拓扑相似性(NTS),分别解决现有拓扑散度的非对称性和无界性问题,实现细粒度结构诊断与跨场景标准化评估。

Comments Accepted by TMLR

详情
AI中文摘要

拓扑数据分析(TDA)为比较神经表示提供了一种原则性的、内在的视角。然而,现有的配对拓扑散度(如RTD)受到启发式非对称性以及更关键的无界分数(依赖于样本量)的限制,阻碍了可靠的跨场景基准测试。为了解决这些挑战,我们开发了一个统一的拓扑工具包,服务于两个互补的需求:细粒度结构诊断和鲁棒的标准化评估。首先,我们通过引入对称表示拓扑散度(SRTD)及其高效变体SRTD-lite来完善RTD框架。除了解决先前变体的理论非对称性外,SRTD将诊断信息整合到一个单一的、全面的交叉条码签名中。这使得能够精确定位结构差异,并作为有效的优化目标,无需双方向计算的开销。其次,为了在异构设置中实现可靠的基准测试,我们提出了归一化拓扑相似性(NTS)。通过测量层次合并顺序的秩相关性,NTS产生一个介于-1和1之间的尺度不变度量,有效克服了未归一化散度的尺度和样本依赖性。在合成和真实深度学习设置中的实验表明,我们的工具包捕捉到了几何度量无法发现的CNN中的功能变化,并且即使在距离饱和情况下也能鲁棒地映射LLM谱系,提供了一种严格的、拓扑感知的视角,补充了CKA等度量。

英文摘要

Topological Data Analysis (TDA) offers a principled, intrinsic lens for comparing neural representations. However, existing paired topological divergences (e.g., RTD) are limited by heuristic asymmetry and, more critically, unbounded scores that depend on sample size, hindering reliable cross-scenario benchmarking. To address these challenges, we develop a unified topological toolkit serving two complementary needs: fine-grained structural diagnosis and robust, standardized evaluation. First, we complete the RTD framework by introducing Symmetric Representation Topology Divergence (SRTD) and its efficient variant SRTD-lite. Beyond resolving the theoretical asymmetry of prior variants, SRTD consolidates diagnostic information into a single, comprehensive cross-barcode signature. This allows for precise localization of structural discrepancies and serves as an effective optimization objective without the overhead of dual directional computations. Second, to enable reliable benchmarking across heterogeneous settings, we propose Normalized Topological Similarity (NTS). By measuring the rank correlation of hierarchical merge orders, NTS yields a scale-invariant metric bounded between -1 and 1, effectively overcoming the scale and sample-dependence of unnormalized divergences. Experiments across synthetic and real-world deep learning settings demonstrate that our toolkit captures functional shifts in CNNs missed by geometric measures and robustly maps LLM genealogy even under distance saturation, offering a rigorous, topology-aware perspective that complements measures like CKA.

2606.06316 2026-06-05 quant-ph cs.AI cs.DS

Quantum enhanced rare event discovery and sampling

量子增强的罕见事件发现与采样

Naixu Guo, Po-Wei Huang, Qisheng Wang, Jayne Thompson, Patrick Rebentrost, Mile Gu, Chengran Yang

发表机构 * Centre for Quantum Technologies, National University of Singapore(量子技术中心,新加坡国立大学) Mathematical Institute, University of Oxford(牛津大学数学研究所) School of Computer Science, Shanghai Jiao Tong University(上海交通大学计算机科学学院) School of Informatics, University of Edinburgh(爱丁堡大学信息学院) College of Computing and Data Science, Nanyang Technological University(南洋理工大学计算与数据科学学院) Nanyang Quantum Hub, School of Physical and Mathematical Sciences, Nanyang Technological University(南洋量子中心,南洋理工大学物理与数学科学学院)

AI总结 针对概率极低的罕见事件发现与采样问题,提出一种无需预先知道事件类型的量子算法,实现了与稀有度阈值的最优量子标度,并在重尾系统和稳态随机过程中分别获得二次加速和鲁棒多项式加速。

Comments 36 pages (8+28)

详情
AI中文摘要

金融崩溃、基础设施的级联故障以及AI系统中的关键错误通常由概率极小的事件触发。因此,高效发现和采样概率低于阈值的事件具有关键意义。然而,使用现有的经典或量子方法,这一任务极具挑战性。由于事件罕见,需要巨大的采样开销才能收集足够的数据样本。此外,由于罕见事件事先未知,无法使用标准技术标记以进行放大。在此,我们提出了一种量子算法,用于罕见事件发现和采样,而无需事先学习哪些事件是罕见的。该算法实现了与稀有度阈值的最优量子标度。我们进一步证明,对于尾部总质量非零的重尾系统,这可以实现二次加速,并且对于稳态随机过程,转化为鲁棒的多项式加速,其指数由其熵率结构决定。

英文摘要

Financial crashes, cascading failures in infrastructure, and critical errors in AI systems are frequently triggered by events that occur with extremely small probability. Efficiently discovering and sampling events with probability below a threshold is therefore of critical interest. Yet this task is highly non-trivial using existing classical or quantum methods. Being rare, such events require an immense sampling overhead to collect sufficient data samples. Moreover, because the rare events are not known in advance, they cannot be flagged for amplification using standard techniques. Here, we introduce a quantum algorithm for rare-event discovery and sampling without first learning which events are rare. The algorithm achieves the optimal quantum scaling with the rarity threshold. We further demonstrate that this can achieve a quadratic speedup for heavy-tailed systems whose tail has nonvanishing total mass, and translates into a robust polynomial speedup for stationary stochastic processes, with the exponent determined by its entropy-rate structure.

2606.06314 2026-06-05 math.NA cs.LG cs.NA stat.ML

DAS-PINNs for high-dimensional partial differential equations: extending deep adaptive sampling to spacetime domains

DAS-PINNs 用于高维偏微分方程:将深度自适应采样扩展到时空域

Anshima Singh, David J. Silvester

发表机构 * University of Manchester(曼彻斯特大学) Department of Mathematics(数学系)

AI总结 提出一种基于归一化流的深度自适应采样框架,将时空视为统一域,通过残差分布自动识别高残差区域并生成采样点,有效求解具有局部动态特征的高维时变PDE。

详情
AI中文摘要

具有空间局部和动态演化解的时变高维偏微分方程对物理信息神经网络(PINNs)构成根本性挑战,因为在高维时空域中均匀配点采样越来越无效。本文将深度自适应采样框架扩展到时变设置,将空间和时间视为统一域,无需任何显式时间推进。归一化流神经网络模型有效学习由PDE残差诱导的分布,并生成集中在解最难学习区域的新配点。与需要显式时间步进或移动网格的传统自适应策略不同,高残差区域由PDE残差分布驱动,在空间和时间上自动识别和跟踪。通过从二维空间中的尖锐移动特征到高达八维空间中的局部结构等一系列基准问题,评估了所提策略的有效性。

英文摘要

Time-dependent high-dimensional partial differential equations (PDEs) with spatially localised and dynamically evolving solutions pose a fundamental challenge for physics-informed neural networks (PINNs), as uniform collocation sampling becomes increasingly ineffective in high-dimensional spatiotemporal domains. In this work, a deep adaptive sampling framework for PINNs is extended to the time-dependent setting by treating space and time as a unified domain without any explicit time marching. A normalising flow neural network model effectively learns the distribution induced by the PDE residual and generates new collocation points concentrated in regions where the solution is most difficult to learn. Unlike conventional adaptive strategies that require explicit time stepping or moving meshes, high-residual regions are automatically identified and tracked across both space and time, driven purely by the PDE residual distribution. The effectiveness of the proposed strategy is assessed on a range of benchmark problems, from sharp and moving features in two spatial dimensions to localised structures in up to eight spatial dimensions.

2606.06313 2026-06-05 physics.flu-dyn cs.LG physics.comp-ph

Wall Shear Stress Reconstruction from Concentration: Differentiable Physics and Physics-Informed Neural Networks

从浓度重建壁面剪切应力:可微物理与物理信息神经网络

Mahmoud Elhadidy, Siva Viknesh, Roshan M. D'Souza, Amirhossein Arzani

发表机构 * Department of Mechanical Engineering, University of Utah, Salt Lake City, UT, USA(机械工程系,犹他大学,盐湖城,UT,美国) Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA(科学计算与成像研究所,犹他大学,盐湖城,UT,美国) Department of Mechanical Engineering, University of Wisconsin–Milwaukee, Milwaukee, WI, USA(机械工程系,威斯康星大学密尔沃基分校,密尔沃基,WI,美国) Department of Biomedical Engineering, University of Utah, Salt Lake City, UT, USA(生物医学工程系,犹他大学,盐湖城,UT,美国)

AI总结 本文通过可微物理框架和物理信息神经网络两种逆方法,从空间有限的被动标量观测中重建壁面剪切应力,并证明测量位置和逆公式共同决定重建精度。

详情
AI中文摘要

壁面剪切应力(WSS)控制近壁输运动力学,是心血管流动中的关键血流动力学指标,但由于需要精确计算近壁速度梯度,仍难以准确推断。被动标量场(如浓度或温度)由相同速度场输运,有潜力揭示隐藏的流动物理指标(如WSS)。本文通过两种根本不同的逆框架,从空间有限的被动标量观测中展示这种重建:基于离散伴随、PDE约束优化的可微物理框架(将控制方程作为硬约束)和物理信息神经网络(PINNs,将其作为软约束)。基准问题包括二维典型后向台阶(2D-BFS)和三维患者特异性狭窄冠状动脉。对于2D-BFS情况,在三种测量场景(近壁、远场和组合)下评估,当近壁数据可用时PINN达到高精度,但限于远场测量时失败,而可微物理方法在所有场景下均恢复准确的WSS。在三维患者特异性案例中,可微物理框架优于PINNs,产生准确的WSS重建。这些结果确立了测量位置和逆公式共同决定基于标量的近壁流动推断的重建保真度。所提出的框架开辟了从标量输运数据估计近壁血流动力学的途径,并广泛适用于可观测被动标量的流体流动问题。

英文摘要

Wall shear stress (WSS) governs near-wall transport dynamics and is a key hemodynamic indicator in cardiovascular flows, yet remains difficult to infer accurately due to the need for precise computation of near-wall velocity gradients. Passive scalar fields, such as concentration or temperature, are advected by the same underlying velocity field and have the potential to uncover hidden flow physics metrics such as WSS. In this work, we demonstrate such reconstruction from spatially limited passive scalar observations using two fundamentally different inverse frameworks: a differentiable physics framework based on discrete adjoint, PDE-constrained optimization, which enforces the governing equations as hard constraints, and physics-informed neural networks (PINNs), which treat them as soft constraints. Benchmark problems include a 2D canonical backward-facing step (2D-BFS) and a 3D patient-specific stenotic coronary artery. For the 2D-BFS case, evaluated under three measurement scenarios (near-wall, far-field, and combined), PINN achieves high accuracy when near-wall data are available but fails when restricted to far-field measurements, whereas the differentiable physics approach recovers accurate WSS across all scenarios. In the 3D patient-specific case, the differentiable physics framework outperforms PINNs, yielding accurate WSS reconstruction. These results establish that measurement location and inverse formulation jointly determine reconstruction fidelity in scalar-based near-wall flow inference. The proposed framework opens a path toward estimation of near-wall hemodynamics from scalar transport data, with broader applicability to fluid flow problems where passive scalars can be observed.

2606.06288 2026-06-05 stat.ML cs.LG

Discrete Causal Representations from Heterogeneous Domains: A Bayesian Approach with Social Survey Applications

来自异构域的离散因果表示:一种贝叶斯方法及其在社会调查中的应用

Ankur Garg, Michael Stettler, Aaron Schein, Julius von Kügelgen

发表机构 * Department of Statistics, University of Chicago(芝加哥大学统计学系) University of Tübingen(图宾根大学) Department of Statistics & Data Science Institute, University of Chicago(芝加哥大学统计学与数据科学研究所) Seminars for Statistics, ETH Zürich(苏黎世联邦理工学院统计系)

AI总结 提出一种贝叶斯方法,从多环境数据中学习离散因果概念,通过序贯蒙特卡洛采样近似多模态后验,并在社会调查数据中验证了其推断有意义的高层概念和因果关系的有效性。

详情
AI中文摘要

因果表示学习旨在推断产生观测到的低层测量的高层潜在因果概念。这对于来自不同环境或领域的异构数据尤其相关,因为分布偏移通常通过某些底层因果机制中的稀疏局部变化而发生,而生成过程的其他部分保持不变。尽管因果表示的可识别性已被广泛研究,但实用的不确定性感知方法和真实世界用例仍较少探索。在这项工作中,我们提出了一种从多环境数据中学习因果表示的贝叶斯方法,重点关注离散因果概念和未知的多节点软干预的情况。为此,我们将因果假设和可解释性需求转化为层次模型中的适当先验和参数选择。然后,我们设计了一种基于序贯蒙特卡洛采样的推理方案来近似得到的多模态后验。我们通过社会调查数据的案例研究展示了我们的方法,其中潜在因果概念对应于文化价值观或政治观点,测量对应于调查响应,环境对应于不同的国家或州。我们的模型推断出有意义的高层概念以及它们之间合理的因果关系,展示了其在学习复杂真实世界数据的因果表示方面的实用性。

英文摘要

Causal representation learning aims to infer the high-level latent causal concepts that give rise to observed low-level measurements. This is particularly relevant for heterogeneous data from different environments or domains since distribution shifts often arise through sparse, localized changes in some of the underlying causal mechanisms, while other parts of the generative process remain unchanged. Whereas identifiability of causal representations has been studied extensively, practical uncertainty-aware methods and real-world use cases remain less explored. In this work, we propose a Bayesian approach to learning causal representations from multi-environment data, focusing on the case of discrete causal concepts and unknown multi-node soft interventions. To this end, we translate causal assumptions and interpretability desiderata into suitable priors and parametric choices within a hierarchical model. We then devise an inference scheme based on sequential Monte Carlo sampling to approximate the resulting multimodal posterior. We showcase our approach through case studies on social survey data, where latent causal concepts correspond to cultural values or political opinions, measurements to survey responses, and environments to different countries or states. Our model infers meaningful high-level concepts and plausible causal relations among them, demonstrating its utility for learning causal representations of complex real-world data.

2606.06273 2026-06-05 cs.IT cs.AI math.IT

Adapting Diffusion Language Models for Lossless Pixel-Level Image Transmission

适应扩散语言模型用于无损像素级图像传输

Tianqi Ren, Rongpeng Li, Xianfu Chen, Yingyu Li, Zhifeng Zhao

发表机构 * College of Information Science and Electronic Engineering, Zhejiang University(浙江大学信息科学与电子工程学院) Shenzhen CyberAray Network Technology Co., Ltd(深圳CyberAray网络技术有限公司) School of Mechanical Engineering and Electronic Information, China University of Geosciences(中国地质大学(武汉)机械与电子信息学院) Zhejiang Lab(浙江实验室)

AI总结 提出基于离散扩散模型的分离源信道编码框架DDM-SSCC,通过双向注意力下的同步逆向算术编码实现无损像素级图像传输,并引入Halton引导去噪顺序、掩码率感知余弦调度和轻量温度校准模块提升性能。

详情
AI中文摘要

无损像素级图像传输是超越语义通信的基本机制,因为精确恢复需要准确的符号概率建模和通过噪声信道的可靠传输。本文提出DDM-SSCC,一种基于离散扩散模型的分离源信道编码框架,用于无损图像传输。与光栅顺序自回归编码不同,所提出的源编解码器将扩散语言模型适应于像素令牌恢复,并在双向注意力下执行同步逆向算术编码,允许在一个逆向去噪步骤中对多个掩码令牌进行编码。这种渐进恢复过程也为噪声传输产生了更有利的源表示,因为新恢复的令牌可以在后续去噪步骤中作为双向上下文。为了弥合面向生成的掩码去噪与无损算术编码之间的差距,我们进一步引入了Halton引导的去噪顺序、掩码率感知的余弦调度和轻量温度校准模块。这些设计分别改善了空间覆盖、使去噪速度适应上下文可靠性,并校准了算术编码使用的概率表。在CIFAR10、DIV2K-LR-X4和Kodak数据集上,针对加性高斯白噪声和瑞利衰落信道的实验表明,DDM-SSCC比代表性的无损和语义通信基线实现了更好的精确恢复性能,而消融研究验证了所提出的去噪顺序、调度和校准模块的有效性。

英文摘要

Lossless pixel-level image transmission is a fundamental regime beyond semantic communications, because exact recovery requires both accurate symbol probability modeling and reliable delivery over noisy channels. This paper proposes DDM-SSCC, a discrete-diffusion-model-based separate source-channel coding framework for lossless image transmission. Different from raster-order autoregressive coding, the proposed source codec adapts a diffusion language model to pixel-token restoration and performs synchronized reverse arithmetic coding under bidirectional attention, allowing multiple masked tokens to be coded within one reverse denoising step. This progressive restoration process also yields a more favorable source representation for noisy transmission, since newly restored tokens can serve as bidirectional context in subsequent denoising steps. To bridge the gap between generation-oriented masked denoising and lossless arithmetic coding, we further introduce a Halton-guided denoising order, a mask-ratio-aware cosine schedule, and a lightweight temperature calibration module. These designs respectively improve spatial coverage, adapt the denoising pace to context reliability, and calibrate the probability tables used by arithmetic coding. Experiments on CIFAR10, DIV2K-LR-X4, and Kodak over additive white Gaussian noise and Rayleigh fading channels show that DDM-SSCC achieves better exact-recovery performance than representative lossless and semantic communication baselines, while ablation studies verify the effectiveness of the proposed denoising order, schedule, and calibration modules.

2606.06261 2026-06-05 cs.NI cs.AI cs.ET cs.MA

DAST: A VLM-LLM Framework for Cross-Interface Anomaly Detection in O-RAN

DAST: 面向O-RAN跨接口异常检测的VLM-LLM框架

Francesco Spinelli, Esteban Municio, Pau Baguer, Gines Garcia-Aviles, Xavier Costa-Perez

发表机构 * i2CAT Foundation(i2CAT基金会) NEC Laboratories Europe(NEC欧洲实验室) ICREA

AI总结 提出DAST,一种零样本多智能体框架,通过VLM→LLM→VLM三级流水线将多变量KPI流转换为视觉表示,结合领域知识进行跨接口异常检测,在真实O-RAN测试平台上优于现有TSAD方法。

Comments 7 pages, 5 figures. This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

O-RAN实现了可编程功能的分解基带栈,这些功能通过标准化开放接口通信。这种支持多厂商组合的开放性也扩展了构成计算连续体的逻辑解耦层之间的攻击面。在这些威胁中,占已编目O-RAN威胁大部分的拒绝服务和性能降级攻击尤其难以检测。传统的时序异常检测(TSAD)方法在这种新机制下失效,因为标记基线稀缺、威胁演化速度快于检测器重训练速度,且高维多变量遥测数据压倒了单一推理模型。为应对这些挑战,我们提出DAST,一种用于O-RAN跨接口异常检测的零样本多智能体框架,它串联了一个三级VLM→LLM→VLM流水线。DAST将多变量KPI流转换为视觉表示,根据O-RAN领域知识对每个接口的文本描述进行评分,并在高分辨率热图上验证可疑点,输出问题接口、异常时间区间、指示性的O-RAN WG11对齐操作影响评级以及决策理由。我们在从O-RAN测试平台收集的真实网络轨迹上评估DAST,在代表性性能降级场景下实现了0.910的F1分数和0.843的准确率,优于最先进的TSAD基线。

英文摘要

O-RAN enables a disaggregated baseband stack with programmable functions that communicate over standardized open interfaces. The same openness that enables multi-vendor composition also expands the attack surface across logically decoupled tiers that make up the compute continuum. Among these threats, Denial-of-Service and performance-degradation attacks, which account for the majority of catalogued O-RAN threats, are particularly difficult to detect. Traditional Time-Series Anomaly Detection (TSAD) methods fail in this new regime where labelled baselines are scarce, threats evolve faster than detectors can be retrained, and the high-dimensional multivariate telemetry overwhelms monolithic inference models. To address these challenges, we present DAST, a zero-shot multi-agent framework for cross-interface anomaly detection in O-RAN that chains a three-stage VLM $\rightarrow$ LLM $\rightarrow$ VLM pipeline. DAST converts multivariate KPI streams into visual representations, scores textual per-interface descriptions against O-RAN domain knowledge, and verifies suspects on high-resolution heatmaps to output the problematic interfaces, the anomalous time intervals, an indicative O-RAN WG11-aligned operational impact rating and the decision rationale. We evaluate DAST on real network traces collected from an O-RAN testbed under representative performance degradation scenarios, achieving 0.910 F1-Score and 0.843 Accuracy, outperforming state-of-the-art TSAD baselines.

2606.06260 2026-06-05 cs.IR cs.AI cs.CL

OneReason Technical Report

OneReason 技术报告

OneRec Team, Biao Yang, Boyang Ding, Chenglong Chu, Dunju Zang, Fei Pan, Han Li, Hao Jiang, Honghui Bao, Huanjie Wang, Jian Liang, Jiangxia Cao, Jiao Ou, Jiaxin Deng, Jinghao Zhang, Kun Gai, Lu Ren, Peiru Du, Pengfei Zheng, Rongzhou Zhang, Ruiming Tang, Shiyao Wang, Siyang Mao, Siyuan Lou, Teng Shi, Wei Yuan, Wenlong Xu, Xingchen Liu, Xingmei Wang, Xinqi Jin, Yan Sun, Yan Wang, Yifei Hu, Yingzhi He, Yufei Ye, Yuhao Wang, Yunhao Zhou, Yuqin Dai, Zhao Liu, Zhipeng Wei, Zhixin Ling, Ziming Li, Zixing Zhang, Ziyuan Liu, An Zhang, Changxin Lao, Chaoyi Ma, Chengru Song, Defu Lian, Fan Yang, Guowang Zhang, Hao Peng, Jiayao Shen, Jie Chen, Jun Xu, Junmin Chen, Kun Zhang, Kuo Cai, Mingxing Wen, Minmao Wang, Minxuan Lv, Qi Zhang, Qiang Luo, Sheng Yu, Shijie Li, Shijie Yi, Shuang Yang, Shugui Liu, Shuni Chen, Tinghai Zhang, Tingting Gao, Xiang Wang, Xiangyu Wu, Xiangyu Zhao, Xiao Lv, Xiaoyou Zhou, Xuming Wang, Yong Du, Zejian Zhang, Zhaojie Liu, Zhiyang Zhang, Zhuang Zhuang, Ziqi Wang, Ziyi Zhao

发表机构 * OneRec Team(OneRec团队)

AI总结 针对生成式推荐模型中推理能力难以激活的问题,提出 OneReason 方法,通过增强感知和认知能力实现有效推理。

Comments Work in progress

详情
AI中文摘要

OneRec 系列中的生成式推荐模型已广泛应用于短视频、直播、广告和电子商务等实际服务中。然而,这些生成模型只能受益于规模优势,其推理能力难以激活,因为我们无法构建仅由物品令牌组成的有意义的思维链序列。受大语言模型领域“先思考后回答”推理范式成功的启发,我们进行了初步研究(即 OneRec-Think、OpenOneRec)以探索生成式推荐中的推理能力。尽管如此,我们注意到一个意外现象:思考模式并未显示出优于非思考模式的优势。从多模态语言模型中关于思维链鲁棒性的最新发现中汲取见解,我们认为推荐中的有效推理依赖于两个因素:感知,即将物品令牌与其底层语言语义相联系的能力;以及认知,即将用户行为序列重组为连贯的潜在兴趣点的能力。因此,我们提出 OneReason,其中包括:(1)预训练中强大的物品令牌感知能力,(2)针对推荐任务的三级认知增强思维链格式在监督微调中,(3)在强化学习中采用先专化后统一的训练方案以增强思考能力。

英文摘要

Generative recommendation models in the OneRec family have been widely deployed in many real-world services, such as short-video, live-streaming, advertising, and e-commerce. However, these generative models can only benefit from the scaling advantage, while their reasoning ability is hard to activate, since we cannot construct meaningful Chain-of-Thought (CoT) sequences consisting of itemic tokens only. Inspired by the success of the reasoning-style ``think before answer'' paradigm in the LLM field, we conduct preliminary studies (i.e., OneRec-Think, OpenOneRec) to explore reasoning capability in generative recommendation. Nevertheless, we notice an unexpected phenomenon: the thinking mode does not show advantages over the non-thinking mode. Drawing insights from recent findings on CoT robustness in multi-modal language models, we argue that effective reasoning in recommendation rests on two factors: perception, the ability to ground itemic tokens in their underlying language semantics, and cognition, the ability to reorganize a user's behavior sequence into coherent latent interest points. We therefore propose OneReason, which includes: (1) strong itemic token perception in pre-training, (2) a three-level cognition-enhanced CoT format for recommendation tasks in SFT, and (3) a specialize-then-unify training recipe in RL to enhance the thinking ability.

2606.06240 2026-06-05 cs.DB cs.AI

TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory

TOKI: 用于LLM智能体持久化记忆中矛盾消解的双时态算子代数

Ziming Wang

发表机构 * The Hong Kong University of Science and Technology(香港科学与技术大学)

AI总结 提出TOKI代数,将四种矛盾消解启发式统一为双时态算子,通过隔离性、模式与溯源三个正确性定理提供写时并发控制契约,并证明审计行防御在LoCoMo任务上的有效性。

Comments 43 pages including full appendices (proofs, protocols, and reproducibility ledger). Code, data, and reproducibility artifact: https://github.com/ZenAlexa/toki-bitemporal-memory

详情
AI中文摘要

LLM智能体的持久化记忆是一个写密集型底层:每次信念更新都是版本化写入,新声明可能与已存储的声明矛盾。生产系统使用四种消解启发式(最后写入者获胜、证据加权合并、等待确认、按规则策略),但都没有声明其假设的隔离级别或允许的写时异常。我们证明矛盾消解是写时并发控制,并明确缺失的契约。TOKI将四种启发式类型化为双行模式上的双时态算子家族,每个算子具有隔离前提条件和保留失败事实的溯源注释(审计行)。四个正确性定理在隔离性、模式和溯源之间闭合契约,将保证提升到算子流水线,并将折叠算子扩展到n元冲突集。紧致性伴随定理证明,在关系调度模型中,关键日志记录裁决法官对于重放一致性是必要的,而所有审计基线都忽略了这一点。基于八个系统的裁决矩阵定位了差距:每个在写路径上保留语言模型法官的基线至少允许三种写时异常(重放不一致、信念漂移偏斜、审计擦除)中的一种;内容寻址的引擎层比较器通过移除法官避免了这些异常,而只有TOKI在保留法官的同时排除了所有三种异常。在其单一自然工作负载切片上,审计行防御使LoCoMo提升了0.86,而消融类型化记忆层在1444个可回答的LoCoMo问题上移除了0.49的准确率;跨系统比较统计功效不足,不声称优越性。贡献在于契约:一个写时正确性规范,在隔离性、模式和溯源上被证明是可靠的,明确了每个生产启发式假设但没有任何部署系统明确声明的保证。

英文摘要

Persistent memory for an LLM agent is a write-heavy substrate: every belief update is a versioned write, and a new claim may contradict a stored one. Production systems use four resolution heuristics (last-writer-wins, evidence-weighted merge, await-confirmation, per-rule policy), yet none declares the isolation level it assumes or the write-time anomalies it admits. We show that contradiction resolution is write-time concurrency control and make the missing contract explicit. TOKI types the four heuristics as one family of bitemporal operators over a dual-row schema, each with an isolation precondition and a provenance annotation that preserves the losing fact in an audit row. Four soundness theorems close the contract across isolation, schema, and provenance, lift the guarantees to operator pipelines, and extend the fold operators to n-ary conflict sets. A tightness companion proves that, within the relational schedule model, keyed logging of the adjudicating judge is necessary for replay consistency, which every audited baseline omits. A verdict matrix over eight systems localizes the gap: every baseline that keeps a language-model judge on the write path admits at least one of three write-time anomalies (replay inconsistency, belief-drift skew, audit erasure); a content-addressed engine-layer comparator avoids them only by removing the judge, and TOKI alone excludes all three while keeping it. On its one natural-workload slice the audit-row defence moves LoCoMo by 0.86, and ablating the typed memory layer removes 0.49 accuracy on 1,444 answerable LoCoMo questions; the cross-system comparison stays underpowered and claims no superiority. The contribution is the contract: a write-time correctness specification, proved sound across isolation, schema, and provenance, pinning the guarantee every production heuristic assumes but no deployed system makes explicit.

2606.06233 2026-06-05 stat.ML cs.LG stat.ME

Anchor PCA

Anchor PCA

Benedikt Seiter, Anya Fries, Julius von Kügelgen, Jonas Peters

发表机构 * ETH Zürich(苏黎世联邦理工学院)

AI总结 针对多领域数据,提出Anchor PCA方法,通过修改目标矩阵进行主成分分析,在保留共享变异方向的同时权衡整体解释方差,实现鲁棒的降维。

详情
AI中文摘要

主成分分析(PCA)是最广泛使用的无监督降维技术之一。我们研究来自多个相关领域的数据的PCA。由于主成分在不同领域通常不同,获得共享低秩嵌入的一种方法是对合并数据进行PCA。然而,这种方法可能关注仅在少数领域中表现出高变异的虚假方向。为了找到在未见但相似领域中仍能解释大部分方差的鲁棒嵌入,我们提出关注共享变异方向。为此,我们引入了Anchor PCA,它在整体解释方差与共享和领域特定低秩嵌入之间的一致性之间进行权衡。Anchor PCA相当于对修改后的目标矩阵进行PCA,因此可以高效求解。此外,我们证明Anchor PCA恢复最大不变子空间,并在有界领域特定协方差膨胀下允许极小极大重构解释。在具有时间漂移的模拟和真实气体传感器数据上,我们分别证明Anchor PCA恢复了最大不变子空间,并产生了比合并基线和最坏情况替代方法在未见领域上解释更多方差的嵌入。综合来看,这些发现确立了Anchor PCA作为从多领域数据进行鲁棒无监督降维的有前途的方法。

英文摘要

Principal component analysis (PCA) is one of the most widely used unsupervised dimension reduction techniques. We study PCA for data from multiple related domains. Since principal components generally differ across domains, one way to obtain a shared low-rank embedding is to perform PCA on the pooled data. However, this approach can focus on spurious directions that exhibit high variation in only a few domains. To find a robust embedding that still explains most variance in unseen but similar domains, we propose instead to focus on shared directions of variation. To this end, we introduce Anchor PCA which trades off overall explained variance with agreement between the shared and domain-specific low-rank embeddings. Anchor PCA amounts to PCA on a modified target matrix and thus can be solved efficiently. Moreover, we show that Anchor PCA recovers a maximal invariant subspace and admits a minimax reconstruction interpretation under bounded domain-specific covariance inflations. On simulated and real-world gas sensor data with temporal drift, we demonstrate, respectively, that Anchor PCA recovers the maximally invariant subspace and yields embeddings that explain more variance on unseen domains than the pooling baseline and a worst-case alternative. Taken together, these findings establish Anchor PCA as a promising approach to robust unsupervised dimension reduction from multi-domain data.

2606.06225 2026-06-05 cs.IR cs.AI cs.LG

Bridging the Semantic-Collaborative Gap: An Asymmetric Graph Architecture for Cold-Start Item Recommendation

弥合语义-协作鸿沟:面向冷启动物品推荐的非对称图架构

Anh Truong, John Trenkle, Yuanbo Chen, Honghong Zhao, Abdullah Alchihabi, Effy Fang, Michael Tamir

发表机构 * Tubi Kumo AI

AI总结 提出Shallow-RHS非对称链接预测架构,通过左端设备塔利用时序历史消息传递捕获协作信号,右端内容塔仅基于内在特征编码,解决冷启动物品推荐中的图归纳补全问题。

详情
AI中文摘要

协同过滤和基于图的推荐模型因利用观察到的用户交互而非常有效,但这种依赖性在新增内容没有交互历史时产生了根本性的冷启动挑战。在Tubi的生产检索系统中,这一挑战还受到服务接口的进一步限制:新内容必须立即分配独立的嵌入,并且模型必须产生适用于近似最近邻检索的设备嵌入。我们通过将冷启动推荐表述为时间二分设备-内容图上的归纳图补全问题来解决这一设置。我们提出Shallow-RHS,一种非对称链接预测架构,其中左端(LHS)设备塔利用时序有效的观看历史消息传递来捕获协作信号,而右端(RHS)内容塔相对于图是故意浅层的,仅从内在特征编码内容。RHS塔不使用基于ID的嵌入、内容侧子图、邻居聚合或交互派生的表示,迫使内容编码器将内在特征映射到协同过滤感知的嵌入空间。训练后,学习到的内容编码器为热内容和新增内容生成嵌入,通过检索热替代邻居实现隐式图补全。我们进一步将相同的表示补全原则扩展到设备冷启动,通过从人口统计特征构建基于群体的嵌入。大规模在线实验表明,在内容冷启动参与度、推广速度、印象获取和设备冷启动参与度方面持续相对改进。

英文摘要

Collaborative filtering and graph-based recommendation models are highly effective because they leverage observed user interactions, but this dependence creates a fundamental cold-start challenge when newly added content has no interaction history. In Tubi's production retrieval system, this challenge is further constrained by the serving interface: new content must be assigned a standalone embedding immediately, and the model must also produce device embeddings suitable for approximate nearest-neighbor retrieval. We address this setting by formulating cold-start recommendation as an inductive graph-completion problem on a temporal bipartite device-content graph. We propose Shallow-RHS, an asymmetric link-prediction architecture in which the left-hand side (LHS) device tower leverages temporally valid watch-history message passing to capture collaborative signals, while the right-hand side (RHS) content tower is intentionally shallow with respect to the graph and encodes content solely from intrinsic features. The RHS tower does not use ID-based embeddings, content-side subgraphs, neighbor aggregation, or interaction-derived representations, forcing the content encoder to map intrinsic features into a collaborative-filtering-aware embedding space. After training, the learned content encoder generates embeddings for both warm and newly ingested content, enabling implicit graph completion through retrieval of warm surrogate neighbors. We further extend the same representation-completion principle to device cold-start by constructing cohort-based embeddings from demographic features. Large-scale online experiments demonstrate consistent relative improvements in content cold-start engagement, promotion speed, impression acquisition, and device cold-start engagement.

2606.06214 2026-06-05 cs.SE cs.AI

Towards the Readability of LLM-Generated Codes through Multitask Representation Engineering

面向大语言模型生成代码可读性的多任务表示工程

Huifan Gao, Liuhua He, Yinghui Pan, Shenbao Yu, Yifeng Zeng, Shengchao Qin, Weidi Sun

发表机构 * School of Aerospace Engineering, Xiamen University(厦门大学航空航天工程学院) School of Artificial Intelligence, Shenzhen University(深圳大学人工智能学院) College of Computer and Cyber Security, Fujian Normal University(福建师范大学计算机与网络安全部分) Department of Computer & Information Sciences, Northumbria University(北爱尔兰北安普顿大学计算机与信息科学系) School of Computer Science and Technology, Xidian University(西安电子科技大学计算机科学与技术学院) Peking University(北京大学)

AI总结 提出多任务表示工程框架,通过低数据依赖和低计算成本的表示工程方法提升LLM生成代码的可读性,并理论分析其对可读性与正确性权衡的影响。

详情
AI中文摘要

正确性和可读性是代码质量的关键指标,分别确保功能保真度和易于理解。虽然现有研究大多关注提高大语言模型(LLM)生成代码的正确性,但可读性仍未得到充分解决。由于其主观性,通过定向控制提高可读性具有挑战性。在本文中,我们采用表示工程(RepE)作为定向控制方法,因为它具有低数据依赖和低计算成本的特点。先前关于RepE的工作主要集中在单一任务的定向控制上,但提高代码可读性需要跨多个任务的控制。因此,我们提出了多任务RepE框架,并从理论上讨论了多任务引导方法对代码可读性和正确性之间权衡的影响。我们进一步提供了全面的实验支持。所有相关实现都是开源的,并可应要求提供。

英文摘要

Correctness and readability are key measures of code quality, respectively ensuring functional fidelity and ease of comprehension. While most existing research focuses on improving the correctness of large language models~(LLMs) generated codes, readability remains under-addressed. Enhancing readability through targeted control is challenging due to its subjective nature. In this article, we employ representation engineering~(RepE) as the targeted control method given its characteristics of low data dependency and low computational cost. Prior work on RepE has primarily focused on the targeted control for a single task, but improving the code readability requires the control across multiple tasks. Accordingly we proposes the multitask RepE framework and theoretically discuss the impact of the multitask steering method on the tradeoff between the code readability and correctness. We further provide comprehensive experiments in support. All the relevant implementations are open-source and available upon request.

2606.06183 2026-06-05 eess.AS cs.CL

Revisiting Lexicon Evaluation in Unsupervised Word Discovery

重新审视无监督词汇发现中的词汇评估

Simon Malan, Danel Slabbert, Herman Kamper

发表机构 * Google Africa PhD Fellowship(谷歌非洲博士 fellowship) South African National Research Foundation(南非国家研究基金会)

AI总结 针对无监督词汇发现中常用评估指标(归一化编辑距离)偏向大簇质量且忽略真实类别分布的问题,提出两种新指标:修正的簇内一致性指标和逆分布指标,通过实验证明其与真实分布更相关且更鲁棒。

Comments 6 figures

详情
AI中文摘要

从发现的类词单元构建词汇是零资源语音处理的核心目标。但我们的评估是否提供了词汇质量的可靠指示?一个常用指标——归一化编辑距离,平均每个簇中发现单元的音素编辑距离。我们表明该指标固有地偏向大簇的质量,阻碍了公平评估。此外,它忽略了真实类别在簇间的分布情况。基于聚类文献中的既定理论,我们提出了两个解决这些缺点的指标:一个修正的指标,在评估簇内一致性时权衡簇大小;以及一个逆指标,评估真实单词在簇间的分布。通过在合成和真实词汇上的实验,我们证明这些指标组合起来:(1)与词汇接近真实分布的程度更紧密相关,(2)对扭曲词汇评估的偏差更鲁棒。

英文摘要

Building a lexicon from discovered word-like units is a central goal in zero-resource speech processing. But do our evaluations provide a trustworthy indication of lexicon quality? A common metric, normalized edit distance, averages the phoneme edit distances between discovered units in each cluster. We show that this metric has an inherent bias toward the quality of large clusters, inhibiting fair evaluation. Moreover, it ignores how well true classes are distributed across clusters. Based on established theory in clustering literature, we propose two metrics that address these shortcomings: a modified metric that weighs cluster size when assessing within-cluster consistency, and an inverse metric that assesses how true words are spread across clusters. Through experiments on synthetic and real-world lexicons, we demonstrate that combined, these metrics are: (1) more closely correlated with how similar a lexicon is to the ground-truth distribution, and (2) more robust to biases that skew lexicon evaluations.

2606.06179 2026-06-05 stat.ML cs.LG

Diffusion Models Observe Only Gradients: A Geometric Perspective on Score Matching Errors

扩散模型仅观察梯度:分数匹配误差的几何视角

Naïl B. Khelifa, Richard E. Turner, Ramji Venkataramanan

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文从几何角度揭示L2分数误差不是衡量边缘分布质量的正确指标,通过Helmholtz-Hodge分解将分数误差分解为梯度分量和螺线管分量,证明只有梯度分量影响Fokker-Planck动力学,并给出仅依赖于梯度分量的KL散度上界及可计算的梯度分量估计器。

详情
AI中文摘要

基于分数的扩散模型通常通过最小化$L^2$分数匹配误差来训练,标准理论分析依赖该量来约束学习分布与目标分布之间的采样差异。我们证明$L^2$分数误差不是边缘分布质量的正确内在度量:一个学习的扩散模型可能在完美匹配目标分布的同时产生任意大的$L^2$分数误差。通过将分数误差分解为梯度分量和螺线管分量(Helmholtz-Hodge分解),我们识别出背后的几何原因:只有梯度分量进入边缘Fokker-Planck动力学,而螺线管分量在结构上不可见。我们在三个结果中精确阐述了这一点。首先,基于修正的几何,我们证明了一个不可能结果:没有$L^2$分数误差的单调函数能够一致地给出学习分布与目标分布之间任何散度的下界。其次,我们推导出Kullback-Leibler散度的上界,该上界仅依赖于误差的可观测梯度分量,从而收紧标准Girsanov界,并指出其宽松性源于在路径空间而非边缘空间动力学上操作的成本。第三,我们通过对偶Sobolev恒等式给出了梯度分量的可处理估计器,实验表明该估计器与样本质量的相关性显著优于完整的$L^2$误差。

英文摘要

Score-based diffusion models are typically trained by minimizing the $L^2$ score matching error, and standard theoretical analyses rely on this quantity to bound the sampling discrepancy between the learned and target distributions. We show the $L^2$ score error is not the right intrinsic measure of marginal distributional quality: a learned diffusion model can incur arbitrarily large $L^2$ score error while perfectly matching the target distribution. By decomposing score errors into a gradient and a solenoidal component (a Helmholtz-Hodge decomposition), we identify the geometric reason behind this: only the gradient component enters the marginal Fokker-Planck dynamics, while the solenoidal component is structurally invisible. We make this precise in three results. First, building on the corrected geometry, we prove an impossibility result: no monotone function of the $L^2$ score error can uniformly lower bound any divergence between the learned and target distributions. Second, we derive an upper bound on the Kullback-Leibler divergence that depends only on the observable gradient component of the error, tightening the standard Girsanov bound and identifying its looseness as the cost of operating on path-space rather than marginal-space dynamics. Third, we give a tractable estimator of the gradient component via a dual Sobolev identity, which is shown to empirically correlate substantially better with sample quality than the full $L^2$ error.

2606.06171 2026-06-05 stat.ML cs.LG cs.NA math.NA physics.comp-ph

Effective Dimensionality as an Operator Invariant for Physics-Preserving Constraint Adaptation in Physics-Informed Neural Networks

有效维度作为物理信息神经网络中保物理约束适配的算子不变量

Cornelius Otchere, Michael Shields

发表机构 * University of Cambridge(剑桥大学)

AI总结 利用Fisher信息矩阵定义物理约束模型的有效自由度d_eff,证明其收敛于微分算子核维数,并基于此提出子空间投影策略实现边界条件适配。

详情
AI中文摘要

物理信息神经网络固有地遭受任务干扰,因为它们依赖共享参数空间来同时满足控制微分方程和边界条件。我们使用Fisher信息矩阵分析这种结构冲突,量化物理约束模型中的有效自由度($d_{eff}$)。与经典的$d_{eff}$(衡量相对于统计先验由数据提供信息的参数方向数量)不同,我们的$d_{eff}$衡量不受微分算子约束的参数方向维度。对于具有有限维核的算子,我们证明$d_{eff}$精确收敛于核维数,与网络宽度、深度或激活函数无关,将其从拟合诊断重新解释为底层连续算子的结构不变量。对于具有无限维核的算子,$d_{eff}$则衡量网络对该核的有限维表示带宽,而非恢复整数不变量。重要的是,$d_{eff}$还作为先验结构诊断。将适定问题的$d_{eff}$驱动到零,证明物理和边界约束已吸收网络的自由方向。基于这一表征,我们引入了用于边界适配的子空间投影策略。无需从头重新训练,我们将参数更新投影到预训练物理算子的零空间,使得新边界条件得到满足而不干扰已学习的物理。基于梯度的微调可以达到或超过此效果,但需要更多的挂钟时间和调参,而子空间投影在几秒到几分钟内提供近乎等效的质量。我们在线性和非线性算子上验证,展示了对初始和边界偏移以及未遇到约束类型的准确适配。

英文摘要

Physics-Informed Neural Networks inherently suffer from task interference because they rely on a shared parameter space to satisfy both governing differential equations and boundary conditions. We analyze this structural conflict using the Fisher Information Matrix to quantify the effective degrees of freedom ($d_{eff}$) in a physics-constrained model. Unlike the classical $d_{eff}$ which measures how many parameter directions are informed by data against a statistical prior, our $d_{eff}$ measures the dimension of the parameter directions unconstrained by the differential operator. For operators with finite-dimensional kernel, we show that $d_{eff}$ converges to the kernel dimension exactly, independent of network width, depth, or activation function, recasting it from a fit diagnostic into a structural invariant of the underlying continuous operator. For operators with infinite-dimensional kernel, $d_{eff}$ instead measures the network's finite-dimensional representational bandwidth for that kernel rather than recovering an integer invariant. Importantly, $d_{eff}$ also serves as an a priori structural diagnostic. Driving $d_{eff}$ of a well-posed problem to zero certifies that the physics and boundary constraints have absorbed the network's free directions. Building on this characterization, we introduce subspace projection strategies for boundary adaptation. Rather than retraining from scratch, we project parameter updates into the null space of the pre-trained physics operator so that new boundary conditions are satisfied without disturbing the learned physics. Gradient-based fine-tuning can match or exceed this but needs more wall-clock time and tuning, whereas subspace projection delivers near-equivalent quality in seconds to minutes. We validate on linear and nonlinear operators, demonstrating accurate adaptation to initial and boundary shifts and unencountered constraint types.

2606.06159 2026-06-05 cs.AR cs.AI cs.NE

ITP-STDP: An Intrinsic-Timing Power-of-Two Learning Engine for On-Chip SNN Training

ITP-STDP:用于片上SNN训练的内在时序二次幂学习引擎

Haihang Xia, Xinyu Zhao, Xuecheng Wang, John Goodenough, Charith Abhayaratne, Panagiotis A. Panagiotou, Chunyi Song, Tiantai Deng

发表机构 * School of Electrical and Electronic Engineering, The University of Sheffield(谢菲尔德大学电子与电气工程学院) Donghai Laboratory(东海实验室) Engineering Research Center of Oceanic Sensing Technologyand Equipment, Ministry of Education(教育部海洋传感技术与设备工程研究中心) State Key Laboratory of Ocean Sensing and Ocean College, Zhejiang University(浙江大学海洋传感与海洋学院国家重点实验室)

AI总结 针对SNN训练中权重更新计算量大导致的硬件资源与能耗问题,提出基于内在时序二次幂的STDP算法(ITP-STDP)及其硬件架构,通过算法与硬件优化消除大部分计算开销,在FPGA和ASIC平台上实现能效和速度的显著提升。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

脉冲神经网络(SNN)有潜力成为第三代神经网络,并在广泛的应用中受到越来越多的关注。然而,SNN中大量的突触连接导致训练过程中片上学习算法的权重更新计算密集,从而消耗大量硬件资源和能量。在现有的SNN学习算法中,脉冲时序依赖可塑性(STDP)是研究最广泛、采用最多的算法之一,是SNN中的基本学习组件。为了解决SNN训练相关的硬件和能量开销,本文提出了内在时序二次幂STDP(ITP-STDP)及其对应的原型学习引擎硬件架构。通过专用的平均场突触漂移模型进行动力学分析,并在不同规模和数据集上的SNN网络中进一步验证。该设计在ASIC和FPGA平台上实现,并与包括原始STDP和更复杂STDP变体在内的最新方法进行比较。结果表明,由于所提出的设计通过算法和硬件级优化消除了STDP的大部分计算开销,因此具有优越的能效、更高的运行速度和显著更低的硬件资源利用率。在FPGA平台上,所提出的设计相比对比设计能效提高了4.5倍至219.8倍。在ASIC平台上,所提出的设计实现了4.8倍至22.01倍的加速,而面积仅为先前工作的1.2%至3.3%。

英文摘要

Spiking neural networks (SNNs) have the potential to emerge as the third generation of neural networks and have attracted increasing attention across a wide range of applications. However, the large number of synaptic connections in SNNs leads to intensive weight-update computation by on-chip learning algorithms during training, resulting in substantial hardware resource utilization and energy consumption. Among existing SNN learning algorithms, spike-timing-dependent plasticity (STDP) is one of the most extensively studied and widely adopted, serving as a fundamental learning component in SNNs. To address the hardware and energy overheads associated with SNN training, this paper presents intrinsic-timing power-of-two STDP (ITP-STDP) and its corresponding prototype learning engine hardware architecture. The proposed design is evaluated through a dedicated mean-field synaptic drift model for dynamical analysis and further validated across SNN networks of different scales and datasets. It is further implemented on both ASIC and FPGA platforms and compared with state-of-the-art approaches, including the original STDP and more complex STDP variants. The results demonstrate superior energy efficiency, higher operating speed, and substantially lower hardware resource utilization, as the proposed design eliminates most of the computational overhead of STDP through both algorithmic and hardware-level optimizations. On the FPGA platform, the proposed design improves energy efficiency by 4.5$\times$ to 219.8$\times$ over the compared designs. On the ASIC platform, the proposed design achieves a 4.8$\times$ to 22.01$\times$ speedup while consuming only 1.2% to 3.3% of the area required by prior works.

2606.06136 2026-06-05 cs.SC cs.AI

A Finite Certificate for the Positive $n=9$ Vasc Inequality

正数 $n=9$ 的 Vasc 不等式的一个有限证书

Dakai Guo, Ruichen Qiu, Yichuan Cao, Ruyong Feng

发表机构 * State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and the School of Mathematics, University of Chinese Academy of Sciences(数学科学国家重点实验室,数学与系统科学研究院,中国科学院,以及中国科学院大学数学系)

AI总结 通过人工引导的 AI 代理 MechMath Agent Team,将有理不等式转化为齐次多项式不等式,并利用累积间隙参数化所有排序的固定最大值锥体,生成覆盖 40320 个锥体的有限证书,从而证明了正实数 $n=9$ 情况下的 Vasc 循环不等式。

详情
AI中文摘要

我们证明了正实数 $n=9$ 情况下的 Vasc 循环不等式。该证明是在 AI 代理 MechMath Agent Team 的人工引导协助下获得的:人类可读部分将有理不等式简化为齐次多项式不等式,固定一个循环最大值,并通过累积间隙参数化每个排序的固定最大值锥体;有限部分是一个覆盖所有 $8!=40320$ 个排序锥体的证书。MechMath Agent Team 通过 Python 工具调用生成了证书验证工作流,包括情况划分、验证程序和终端分类。已发布的证书包含 36815 个系数叶子、2236 个普通 Polya 乘子叶子和 1269 个 AM-GM 中点覆盖叶子。人类作者审计了数学简化和验证逻辑,一个单独的工件包含证书、独立验证器和从源代码重建的路径。

英文摘要

We prove the positive-real $n=9$ case of the Vasc cyclic inequality. The proof was obtained with human-guided assistance from the AI agent MechMath Agent Team: the human-readable part reduces the rational inequality to a homogeneous polynomial inequality, fixes a cyclic maximum, and parametrizes each sorted fixed-maximum cone by cumulative gaps; the finite part is a certificate covering all $8!=40320$ sorted cones. MechMath Agent Team generated the certificate verification workflow through Python tool calls, including the case split, verification programs, and terminal classifications. The published certificate has $36815$ coefficient leaves, $2236$ ordinary Polya multiplier leaves, and $1269$ AM-GM midpoint overlay leaves. Human authors audited the mathematical reductions and verification logic, and a separate artifact contains the certificate, an independent verifier, and a from-source rebuild route.

2606.06133 2026-06-05 cs.SE cs.AI cs.LG cs.LO

TLA-Prover: Verifiable TLA+ Specification Synthesis via Preference-Optimized Low-Rank Adaptation

TLA-Prover: 通过偏好优化低秩适配实现可验证的 TLA+ 规范合成

Eric Spencer, Arslan Bisharat, Brian Ortiz, Khushboo Bhadauria, TaiNing Wang, George K. Thiruvathukal, Konstantin Laufer, Mohammed Abuhamad

发表机构 * Department of Computer Science, Loyola University Chicago(洛约拉芝加哥大学计算机科学系)

AI总结 提出 TLA-Prover 模型,结合监督微调和基于修复的组相对策略优化,在 TLC 模型检查器上实现 TLA+ 规范合成,Gold/Diamond 级别通过率达 30%,约为未调优基线的 3.5 倍。

Comments 12 pages, 5 tables, 3 figures. Submitted at the 21st International Conference on Software Technologies (ICSOFT 2026)

详情
AI中文摘要

TLA+ 是一种用于验证分布式系统和安全关键协议的正式规范语言。大型语言模型(LLM)生成的 TLA+ 规范常常因语义原因无法通过 TLC 模型检查器。在 25 个 LLM 中,最佳公开基线的语法解析成功率为 26.6%,语义模型检查通过率为 8.6%。我们提出了 TLA-Prover,一个 200 亿参数的 TLA+ 规范合成模型。训练结合了在已验证示例上的监督微调(SFT)和基于修复的组相对策略优化(GRPO)。在 GRPO 阶段,模型学习修复自身被拒绝的规范。我们还从相同的 SFT 检查点训练了一个直接偏好优化(DPO)变体作为消融实验。TLC 直接提供奖励信号,无需学习奖励模型。每个输出分为四个等级:青铜(解析通过)、银(无警告)、金(通过 TLC)和钻石。要达到钻石级,模型的正确性属性会被自动微小修改;TLC 必须检测到违反。如果 TLC 仍然通过,则该属性始终为真且无贡献;输出无法达到钻石级。在一个保留的 30 问题基准上,TLA-Prover 在金级和钻石级均达到 9/30(即 pass@1 = 30%)。这大约是未调优基线 8.6% 的 3.5 倍。DPO 变体在钻石级达到 20%。金级和钻石级在每个检查点都一致;这防止了平凡属性失败模式。

英文摘要

TLA+ is a formal specification language for verifying distributed systems and safety-critical protocols. Large language models (LLMs) frequently produce TLA+ specifications that fail the TLC model checker for semantic reasons. Across 25 LLMs, the best public baseline is 26.6% syntactic parse and 8.6% semantic model-check. We present TLA-Prover, a 20-billion-parameter model for TLA+ specification synthesis. Training combines supervised fine-tuning (SFT) on verified examples with repair-based group-relative policy optimization (GRPO). In the GRPO stage, the model learns to fix its own rejected specifications. We also train a direct preference optimization (DPO) variant from the same SFT checkpoint as an ablation. TLC provides the reward signal directly, with no learned reward model. Four tiers grade each output: Bronze (parses), Silver (no warnings), Gold (passes TLC), and Diamond. To reach Diamond, the model's correctness property is automatically altered in a small way; TLC must then detect a violation. If TLC still passes, the property was always-true and contributes nothing; the output fails Diamond. TLA-Prover reaches 9/30 (i.e. pass@1 = 30%) at both Gold and Diamond on a held-out 30-problem benchmark. This is roughly 3.5x the 8.6% untuned baseline. The DPO variant reaches 20% at Diamond. Gold and Diamond coincide at every checkpoint; this prevents the trivial-property failure mode.

2606.06117 2026-06-05 q-bio.QM cs.LG math.AT q-bio.GN

$p$-adic Bi-Filtrations for Topological Machine Learning on Genomic Sequences

$p$-adic 双过滤用于基因组序列的拓扑机器学习

Tirtharaj Dash, Gunja Sachdeva

发表机构 * Department of CS & IS, BITS Pilani, K K Birla Goa Campus(计算机科学与信息系统系,比斯潘大学,KK Birla Goa校区) Department of Mathematics, BITS Pilani, K K Birla Goa Campus(数学系,比斯潘大学,KK Birla Goa校区)

AI总结 提出 pVR 框架,结合 $p$-adic 数与拓扑数据分析,通过双过滤 Vietoris-Rips 复形提取基因组序列拓扑特征,在低样本数据集上优于基线方法。

Comments 12 pages, 5 figures, 8 tables

详情
AI中文摘要

我们引入 pVR,一种用于无比对基因组序列分类的拓扑机器学习框架,该框架将 $p$-adic 数与拓扑数据分析相结合。每条 DNA 序列沿两个互补轴编码:$k$-mer 前缀上的 $p$-adic 距离,捕捉层次化位置结构;以及 $k$-mer 频率上的组合 $L_1$ 距离,捕捉局部序列内容。这两个距离共同参数化一个双过滤 Vietoris-Rips 复形,来自该双过滤的每条序列的拓扑摘要作为标准机器学习分类器的特征。我们为该构造建立了理论保证:在度量扰动下的稳定性以及对素数选择的不变性,同时还有一个结果解释了为什么单个 $p$-adic 轴在拓扑上无信息,而双过滤能恢复非平凡同调。在十二个基因组基准测试(28 到 500 条序列,3 到 7 个类别)上,pVR 在六个低样本数据集中的三个上优于四种已建立的无对齐基线方法,提升高达 21 个百分点;仅在一个 SARS-CoV-2 变异基准测试上表现不佳,该基准测试的点突变偏离违反了层次化假设,并且所有方法在大样本情况下均达到饱和。pVR 还在三个低样本基准测试上优于 5 亿参数 Nucleotide Transformer v2 的零样本冻结嵌入,提升 6.7 到 11.4 个百分点。pVR 代码库公开于 https://github.com/MAHI-Group/pVR。

英文摘要

We introduce pVR, a topological machine learning framework for alignment-free genomic sequence classification that combines $p$-adic numbers with topological data analysis. Each DNA sequence is encoded along two complementary axes: a $p$-adic distance on $k$-mer prefixes, which captures hierarchical positional structure, and a compositional $L_1$ distance on $k$-mer frequencies, which captures local sequence content. The two distances jointly parameterise a bi-filtered Vietoris--Rips complex, and per-sequence topological summaries from this bi-filtration serve as features for standard machine learning classifiers. We establish theoretical guarantees for the construction: stability under metric perturbations and invariance to the choice of prime, alongside a result that explains why a single $p$-adic axis is topologically uninformative and why the bi-filtration recovers nontrivial homology. On twelve genomic benchmarks ($28$ to $500$ sequences, $3$ to $7$ classes), pVR outperforms four established alignment-free baselines on three of six low-sample datasets, with gains of up to $21$ percentage points; it underperforms only on a SARS-CoV-2 variant benchmark whose point-mutation divergence violates the hierarchical assumption, and all methods saturate in the large-sample regime. pVR also outperforms zero-shot frozen embeddings from the 500M-parameter Nucleotide Transformer v2 by $6.7$ to $11.4$ percentage points on three low-sample benchmarks. The pVR codebase is publicly available at https://github.com/MAHI-Group/pVR.

2606.06056 2026-06-05 cs.SE cs.AI cs.LG

Metamorphic Testing with the Rashomon Set: Explanation Faithfulness in Machine Learning

使用Rashomon集的蜕变测试:机器学习中的解释忠实性

Helge Spieker, Jørn Eirik Betten, Arnaud Gotlieb

发表机构 * Norwegian Ministry of Education and Research(挪威教育与研究部)

AI总结 针对机器学习中因Rashomon效应导致解释不可靠的问题,提出基于蜕变测试的框架,通过后验解释方法评估特征归因的忠实性,无需真实标签。

Comments Accepted at 10th International Workshop on Metamorphic Testing (MET 2026)

详情
AI中文摘要

多个机器学习模型在同一任务上可以达到近乎相等的预测性能,但提供不同的基于特征的解释。这被称为(可解释)机器学习的Rashomon效应,它引发了哪些解释(如果有的话)是可信的问题。我们提出了一个基于蜕变测试的框架,该框架通过探索后验解释方法中的归因特征重要性来评估解释忠实性,无需真实标签。五个蜕变关系形式化了模型行为与特征归因之间的预期一致性属性。我们将这个通用框架应用于两个表格回归数据集和两个后验解释器(SHAP和LIME)以演示该方法。该框架提供了一个实用的、模型无关的工具,用于选择具有可靠和可信解释的准确模型。

英文摘要

Multiple machine learning models can achieve near-equivalent predictive performance on the same task, yet provide divergent feature-based explanations. This is called the Rashomon effect of (explainable) machine learning, and it raises the question of which explanations, if any, are trustworthy. We propose a framework based on metamorphic testing that assesses explanation faithfulness without requiring ground-truth labels by exploring attributed feature importance from post-hoc explanation methods. Five metamorphic relations formalize expected consistency properties between model behavior and feature attributions. We apply this general framework to two tabular regression datasets and two post-hoc explainers (SHAP and LIME) to demonstrate the approach. The framework offers a practical, model-agnostic tool for selecting accurate models with reliable and trustworthy explanations.

2606.06043 2026-06-05 stat.ML cs.LG

Adaptive Learning Rates with Surrogate Probability for Follow-the-Perturbed-Leader

基于替代概率的自适应学习率用于跟随扰动领导者

Jongyeong Lee, Junya Honda, Shinji Ito, Chansoo Kim

发表机构 * Korea Institute of Science and Technology(韩国科学技术院) Kyoto University(京都大学) RIKEN AIP(理化学研究所AIP) The University of Tokyo(东京大学) University of Science and Technology(科学技术大学)

AI总结 针对FTPL算法因无优化特性难以设计自适应概率依赖学习率的问题,提出基于替代概率函数的自适应学习率,实现了任意形状参数α>1的Pareto扰动下的最佳双世界保证,并扩展到专家建议的赌博机问题。

Comments TBA COLT2026

详情
AI中文摘要

跟随正则化领导者框架在在线学习问题中显示出有效性和灵活性,其中学习率的选择至关重要。最近,通过求解凸优化获得的、基于臂选择概率定义的自适应学习率,在各种赌博机问题中实现了改进的最佳双世界(BOBW)保证。相比之下,其计算高效替代方案——跟随扰动领导者(FTPL)的BOBW保证仍然相对有限,因为其无优化特性讽刺地使得设计自适应的、概率依赖的学习率变得非平凡。为了解决这一挑战,我们通过引入替代概率函数为FTPL提出了一种自适应学习率,该函数仅从可用量计算,无需精确概率。基于这些具有替代函数的学习率,我们为具有任意形状参数$α>1$的Pareto扰动的FTPL提供了BOBW保证,推广了先前仅限于特定选择$α=2$的结果。我们进一步展示了在具有专家建议的赌博机问题中,具有自适应学习率的FTPL的BOBW保证。我们的方法保留了FTPL的计算简单性,同时实现了概率依赖的自适应性,并且基于替代的方法论可能在其他算法框架(超越FTPL和学习率设计)中具有独立意义。

英文摘要

Follow-the-regularized-leader framework has shown effectiveness and flexibility in online learning problems, where the choice of learning rates are known to be crucial. Recently, adaptive learning rates defined in terms of the arm-selection probabilities, obtained by solving convex optimization, have achieved improved best-of-both-worlds (BOBW) guarantees in various bandit problems. In contrast, BOBW guarantees for its computationally efficient alternative, follow-the-perturbed-leader (FTPL), remain relatively limited since its optimization-free nature ironically makes the design of adaptive, probability-dependent learning rates non-trivial. To address this challenge, we propose an adaptive learning rate for FTPL by introducing surrogate probability functions that can be computed only from the available quantities, without requiring the exact probabilities. Based on these learning rates with surrogate functions, we provide the BOBW guarantee for FTPL with Pareto perturbations for any shape parameter $α>1$, generalizing prior results restricted to specific choices of $α=2$. We further show the BOBW guarantees for FTPL with adaptive learning rates in the bandit problem with expert advices. Our approach preserves the computational simplicity of FTPL while enabling probability-dependent adaptivity, and the surrogate-based methodology may be of independent interest in other algorithmic frameworks beyond FTPL and learning rate designs.

2606.05986 2026-06-05 cs.CR cs.AI

AttackPathGNN: Cross-function vulnerability detection in smart contracts using state interference graphs and conjunction pooling

AttackPathGNN:使用状态干扰图和合取池化的智能合约跨函数漏洞检测

Gabriela Dobrita, Simona-Vasilica Oprea, Adela Bara

发表机构 * Bucharest University of Economic Studies(布加勒斯特经济学院)

AI总结 提出AttackPathGNN,一种图神经网络,通过构建状态干扰图和合取池化机制,将漏洞检测转化为对显式攻击路径的推理,实现跨函数漏洞的高精度检测。

详情
AI中文摘要

现有的基于学习的Solidity智能合约检测器将漏洞检测简化为单个函数内的语法模式匹配,然而许多最重大的利用(The DAO、Cream Finance)并不存在于任何单个函数中,而是存在于函数之间的关系以及使攻击可行的条件组合中。因此,我们提出了AttackPathGNN,一种图神经网络(GNN),它将检测重新定义为对显式攻击路径的推理。两个架构选择使其区别于先前的基于GNN的检测器:(1)状态干扰图,它通过类型化、加权边以及由显式五条件谓词定义的有向重入路径边,连接每一对共享可变存储的函数;(2)合取池化,一种对八个命名利用前提条件进行可微AND聚合的机制,其log-sigmoid形式使得当任何单一缓解措施(重入守卫、访问控制修饰符或SafeMath)到位时,每个函数的利用分数会骤降。在五次独立训练运行中,AttackPathGNN在SmartBugs Wild保留测试分区上达到92.3±0.2%的F1分数(假阴性率4.3±0.3%,在独立人工标注的SmartBugs Curated基准上检测率为90.8±2.5%),在每次随机种子下以100%的召回率恢复了6/10的DASP10类别,重入检测召回率为98.7±1.8%。每个预测都附带结构化的修复报告,将每个判定转化为可操作的、函数级的审计发现。

英文摘要

Existing learning-based detectors for Solidity smart-contracts reduce vulnerability detection to syntactic pattern matching within single functions, yet many of the most consequential exploits (The DAO, Cream Finance) exist not in any individual function but in the relationship between functions and in the combination of conditions that made the attack feasible. Thus, we propose AttackPathGNN, a graph neural network (GNN) that reframes detection as reasoning over explicit attack paths. Two architectural choices distinguish it from prior GNN-based detectors: (1)a State Interference Graph that links every pair of functions sharing mutable storage through typed, weighted edges and through directed reentrancy-path edges defined by an explicit five-condition predicate; (2)conjunction pooling, a differentiable AND-aggregator over eight named exploit preconditions whose log-sigmoid form causes the per-function exploit score to collapse whenever any single mitigation (a reentrancy guard, an access-control modifier or SafeMath) is in place. Across five independent training runs, AttackPathGNN attains 92.3+/-0.2% F1 on the SmartBugs Wild held-out test partition (4.3+/-0.3% false-negative rate, 90.8+/-2.5% detection rate on the independently human-labelled SmartBugs Curated benchmark), recovering 6/10 DASP10 categories at 100% on every seed and Reentrancy at 98.7+/-1.8%. Each prediction is emitted with a structured remediation report, turning each verdict into an actionable, function-level audit finding.

2606.05966 2026-06-05 cs.DB cs.AI

Causal Scaffolding for Physical Reasoning: A Benchmark for Causally-Informed Physical World Understanding in VLMs

物理推理的因果支架:面向视觉语言模型中因果启发的物理世界理解基准

Tianyi Tang, Zhuoyi Lin, Zeyu Feng, Tianyi Ma, Yew-Soon Ong, Ivor Tsang, Haiyan Yin

发表机构 * CFAR(因果推理研究所) IHPC(信息技术研究所) Agency for Science, Technology and Research (A*STAR)(科技研究局) Nanyang Technological University(南洋理工大学)

AI总结 提出CausalPhys基准(含3000+视频/图像问题及因果图),并设计因果图度量评估VLM推理,进一步提出因果理性微调(CRFT)提升推理准确性与可解释性。

Comments Accepted by KDD 2026 Dataset and Benchmark Track

详情
AI中文摘要

理解和推理物理世界是智能行为的基础,但最先进的视觉语言模型(VLM)在因果物理推理中仍会失败,常常产生看似合理但错误的答案。为解决这一问题,我们引入了CausalPhys,一个包含超过3000个精心策划的视频和图像问题的基准,涵盖四个领域:感知、预期、干预和目标导向。每个问题都配有一个专家注释的因果图,捕捉对象-属性-事件依赖关系,从而实现可解释且细粒度的因果理解评估。在此基础上,我们制定了一个因果图接地度量,定量衡量模型的思维链推理与正确因果关系的对齐程度,超越了仅基于答案的准确性,并能够系统诊断VLM的因果推理失败。使用该度量,我们对领先的VLM进行了全面分析,揭示了在捕捉因果依赖关系方面的系统性差距,并强调了因果感知学习的必要性。为解决这些局限性,我们进一步提出了因果理性微调(CRFT),明确将VLM推理与因果结构对齐。大量实验表明,CRFT在多个模型骨干上显著提升了推理准确性和可解释性。通过统一数据集整理、因果评估和因果感知学习,CausalPhys为推进现代VLM实现因果接地物理推理奠定了坚实基础。

英文摘要

Understanding and reasoning about the physical world is the foundation of intelligent behavior, yet state-of-the-art vision-language models (VLMs) still fail at causal physical reasoning, often producing plausible but incorrect answers. To address this gap, we introduce CausalPhys, a benchmark of over 3,000 carefully curated video- and image-based questions spanning four domains: Perception, Anticipation, Intervention, and Goal Orientation. Each question is paired with an expert-annotated causal graph capturing object-attribute-event dependencies, enabling interpretable and fine-grained evaluation of causal understanding. Building on this, we formulate a causal-graph-grounded metric that quantitatively measures how well a model's chain-of-thought reasoning aligns with the correct causal relations, moving beyond answer-only accuracy and enabling systematic diagnosis of VLMs' causal reasoning failures. Using this metric, we conduct a comprehensive analysis of leading VLMs, revealing systematic gaps in capturing causal dependencies and underscoring the need for causality-aware learning. To address these limitations, we further propose Causal Rationale-informed Fine-Tuning (CRFT), which explicitly aligns VLM reasoning with causal structures. Extensive experiments demonstrate that CRFT substantially enhances both reasoning accuracy and interpretability across multiple model backbones. By unifying dataset curation, causal evaluation, and causality-informed learning, CausalPhys establishes a strong foundation for advancing modern VLMs toward causally grounded physical reasoning.

2606.05942 2026-06-05 stat.ML cs.LG

EML-CD: Causal Mechanism Recovery via EML Symbolic Trees in Structure Learning

EML-CD:通过结构学习中的EML符号树进行因果机制恢复

Sota Asanuma

发表机构 * SoftBank Corp(软银公司)

AI总结 提出EML-CD框架,利用EML算子构建可解释的因果机制符号树,从数据中自动发现闭式因果方程,在真实和合成数据上实现了机制恢复与结构学习的平衡。

详情
AI中文摘要

基于神经网络(NN)的非线性因果发现方法能够恢复DAG结构,但将每个因果机制视为黑箱。Waxman等人认为从NN权重中提取因果机制是不适定的。我们提出EML-CD,一个将EML算子(能够从单个二元运算符组合初等函数)集成到因果结构学习中的框架,以可解释的机制恢复为主要目标。EML-CD将每条边机制表示为门控EML二叉树,并自动发现闭式因果方程。解析雅可比矩阵可直接从输出方程计算,从而定量理解因果效应。在真实数据(Sachs蛋白信号,d=11)上,EML-CD达到SHD=11.2±0.4(5次种子均值;基线为单次确定性运行),与PC/GES在种子方差内相当且低于CAM,同时为每条检测到的边附加闭式方程(精确率0.756,召回率0.365)。在已知机制的受控双变量测试中,EML-CD忠实恢复了11个初等函数族中的10个(留出形状相关性≥0.96;仅高频正弦部分恢复)。在符号合成基准上,EML-CD的留出机制f-MSE远低于固定SINDy字典且更稳定(均值3.67对比7644,后者因一次种子的灾难性外推而膨胀),尽管其结构恢复(SHD 14.0)仅与字典相当且低于专用优化器;在Causal Chambers光隧道子集上,深度2模型将F1分数从线性OLS-BIC的0.273提升至0.444。

英文摘要

Neural network (NN)-based nonlinear causal discovery methods recover DAG structure but leave each causal mechanism as a black box. Waxman et al. argued that extracting causal mechanisms from NN weights is ill-posed. We propose EML-CD, a framework that integrates the EML operator (capable of composing elementary functions from a single binary operator) into causal structure learning, with interpretable mechanism recovery as the primary objective. EML-CD represents each edge mechanism as a gated EML binary tree and automatically discovers closed-form causal equations. Analytical Jacobians can be directly computed from the output equations, enabling quantitative understanding of causal effects. On real data (Sachs protein signaling, d=11), EML-CD achieves SHD=11.2 +/- 0.4 (5-seed mean; baselines are single deterministic runs), on par with PC/GES within seed variance and below CAM, while attaching closed-form equations to each detected edge (precision 0.756, recall 0.365). In a controlled bivariate test with known mechanisms, EML-CD recovers 10 of 11 elementary function families faithfully (held-out shape correlation >= 0.96; only high-frequency sine is partial). On a symbolic synthetic benchmark, EML-CD attains a substantially lower and more stable held-out mechanism f-MSE than a fixed SINDy dictionary (mean 3.67 vs. 7644, the latter inflated by catastrophic extrapolation on one seed), although its structure recovery (SHD 14.0) only matches the dictionary and stays below specialized optimizers; on the Causal Chambers light-tunnel subset, a depth-2 model improves F1 over linear OLS-BIC (0.444 vs. 0.273).

2606.05920 2026-06-05 cs.SE cs.CL

Asuka-Bench: Benchmarking Code Agents on Underspecified User Intent and Multi-Round Refinement

Asuka-Bench: 针对未明确用户意图与多轮优化的代码智能体基准测试

Xin Wang, Liangtai Sun, Yaoming Zhu, Shuang Zhou, Jiaxing Liu, Fengjiao Chen, Lin Qiu, Xuezhi Cao, Xunliang Cai, Licheng Zhang, Zhendong Mao

发表机构 * University of Science and Technology of China(中国科学技术大学) Independent researchers(独立研究人员)

AI总结 提出Asuka-Bench基准,通过未明确用户意图与多轮优化闭环评估代码智能体,包含50个网页任务和784个评估标准,揭示模型间显著性能差异。

Comments under review

详情
AI中文摘要

现有的代码生成基准测试仅根据完整提示到一次性输出的单一映射进行评分。然而,真实的网页开发并非如此。用户很少在开始时编写完整的规范;许多需求只有在他们查看中间结果并对其做出反应时才变得清晰。我们提出了Asuka-Bench,一个将未明确用户意图与多轮优化相结合的基准测试,其基础是浏览器渲染的行为。每个任务通过一个闭环解决:代码智能体生成一个网页项目,UI智能体在部署的站点上执行测试用例,用户LLM将评估结果转化为下一轮的自然语言反馈。该基准测试包含50个网页任务,具有784个评估标准和2402个预期结果。我们在2个智能体框架上对8个LLM进行了基准测试。结果清晰地区分了模型:加权任务通过率相差38个百分点,并且模型在从反馈中修复的能力上也存在显著差异。Asuka-Bench也远未饱和:即使是最强的模型,在三轮后也只完成了52%的项目。

英文摘要

Existing code-generation benchmarks score a single mapping from a complete prompt to a one-shot output. However, real web development is different. Users seldom write a full spec at the start; many requirements only become clear once they look at an intermediate result and react to it. We present Asuka-Bench, a benchmark that pairs underspecified user intent with multi-round refinement, grounded in browser-rendered behavior. Each task is resolved through a closed loop: a Code Agent generates a web project, a UI Agent executes test cases on the deployed site, and a User LLM turns evaluation outcomes into natural-language feedback for the next round. The benchmark comprises 50 web tasks with 784 evaluation criteria and 2402 expected outcomes. We benchmark 8 LLMs across 2 agent frameworks. The results separate models clearly: weighted Task Pass Rate varies by 38 percentage points and models also differ substantially in their ability to repair from feedback. Asuka-Bench is also far from saturated: even the strongest model completes only 52% of projects after three rounds.

2606.05871 2026-06-05 cs.IT cs.AI math.IT stat.ME

Compositional Boundaries for Density Fusion

密度融合的组合边界

Ratan Bahadur Thapa, Ali Darijani, Jürgen Beyerer, Steffen Staab

发表机构 * University of Stuttgart Department of Computer Science, Germany(斯图加特大学计算机科学系,德国) KIT Department of Computer Science, Germany(卡尔斯鲁厄理工学院计算机科学系,德国) Fraunhofer IOSB of Fraunhofer-Gesellschaft, Germany(弗劳恩霍夫研究所IOSB分部,德国) University of Southampton Department of Computer Science, United Kingdom(南安普顿大学计算机科学系,英国)

AI总结 研究分布式不确定性管理系统中加权概率密度的层次融合顺序不变性,证明在连续二元规则下,顺序不变的层次融合等价于归一化加权线性池化,并揭示了端点-候选f-散度平衡的局部几何障碍。

详情
AI中文摘要

分布式不确定性管理系统通常沿着由通信、隐私或调度约束选择的聚合树组合局部概率模型。最终密度应取决于加权源,而不是中间节点组合它们的特定顺序。我们将这一要求研究为加权概率密度的二元融合的代数组合性问题。核心问题是局部融合规则何时可以层次化执行同时保持顺序不变。我们为局部段值融合规则建立了一个组合边界。在具有加性输出权重和仅权重系数的连续二元规则类中,顺序不变的层次执行刻画了归一化加权线性池化;范数诱导的段平衡实现了相应的系数。平滑端点-候选$f$-散度平衡具有不同的局部几何:其二次展开引入了平方根有效权重,表明仅凭成对可解性不足以实现调度无关的融合。我们证明这一障碍是端点-候选二元平衡所特有的,而全局散度重心保留了加性权重的局部极限。最后,高斯混合展示了相同问题如何在有限模型类中出现:精确融合是组合的,而逐步压缩仅在未归一化分量测度的同余条件下才是组合的。这些结果区分了精确的调度无关融合与全局聚合目标及局部近似启发式。

英文摘要

Distributed uncertainty-management systems often combine local probabilistic models along aggregation trees chosen by communication, privacy, or scheduling constraints. The final density should depend on the weighted sources, not on the particular order in which intermediate nodes combine them. We study this requirement as an algebraic compositionality problem for binary fusion of weighted probability densities. The central question is when a local fusion rule can be executed hierarchically while remaining order-invariant. We establish a compositional boundary for local segment-valued fusion rules. Within the class of continuous binary rules with additive output weights and weight-only coefficients, order-invariant hierarchical execution characterizes normalized weighted linear pooling; norm-induced segment balancing realizes the corresponding coefficient. Smooth endpoint-to-candidate $f$-divergence balancing has a different local geometry: its quadratic expansion induces square-root effective weights, showing why pairwise solvability alone is insufficient for schedule-independent fusion. We show that this obstruction is local to endpoint-to-candidate binary balancing, whereas global divergence barycenters retain additive-weight local limits. Finally, Gaussian mixtures show how the same issue appears in finite model classes: exact fusion is compositional, whereas stepwise compression is compositional only under a congruence condition on unnormalized component measures. These results distinguish exact schedule-independent fusion from global aggregation objectives and local approximation heuristics.