arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4033
2606.15482 2026-06-16 stat.ML cs.LG 新提交

Ricci-Filtration: Boosting Retrieval-Augmented Generation Reranker to Query-Answer Tasks by Discrete Ricci Flow

Ricci-Filtration:通过离散Ricci流提升检索增强生成重排序器在查询-答案任务中的性能

Tian Qin, Wei-Min Huang

发表机构 * arXiv.org Tian Qin(田琴) Wei-Min Huang(黄伟民)

AI总结 提出基于离散曲率和Ricci流的几何重排序增强方法Ricci-Filtration,通过建模查询与检索块为网络并利用曲率过滤噪声块,显著提升RAG生成性能。

详情
AI中文摘要

Ricci流是一种曲率引导的扩散过程,通过收缩高正曲率区域和扩张负曲率区域来变形空间。类似地,加权图上的离散Ricci流通过收缩正Ricci曲率的边和拉伸负Ricci曲率的边来修改边权重,有效增加簇之间的分离度。受这两项开创性工作的启发,我们提出了一种基于几何的RAG重排序增强方法,称为Ricci-Filtration。通过将输入查询和初始检索块建模为一个网络,其中输入查询和块作为节点,基于嵌入的成对关系定义初始图,Ricci-Filtration利用离散曲率和Ricci流评估每个块相对于用户查询的结构重要性。该系统首先根据块相对于查询的几何曲率过滤初始块;然后,重排序器处理剩余块以增强生成性能。我们从理论上证明,归一化离散Ricci流可以通过识别边权重的不同渐近行为来检测社区结构。这支持移除相对于查询节点具有大权重和负Ricci曲率的“噪声”文档块。大量实验证实,Ricci-Filtration在准确率、精确率、召回率和F1分数上优于几种基线重排序方法。此外,消融研究表明,Ricci-Filtration在各种设置下通常优于基线,突显了该框架在不同架构下的鲁棒性。

英文摘要

Ricci flow is a curvature-guided diffusion process that deforms space by shrinking regions of high positive curvature and expanding those with negative curvature. Similarly, discrete Ricci flow on weighted graphs modifies edge weights by shrinking edges with positive Ricci curvature and stretching those with negative Ricci curvature, effectively increasing the separation between clusters. Inspired by these two cornerstone works, we propose a geometry-based RAG reranker enhancement procedure called Ricci-Filtration. By modeling the input query and initial retrieved chunks as a network, where the input query and chunks serve as nodes and embedding-based pairwise relations define an initial graph, Ricci-Filtration leverages discrete curvature and Ricci flow to evaluate the structural importance of each chunk with respect to the user query. The system first filters the initial chunks based on their geometric curvature relative to the query; then, a reranker processes the remaining chunks to enhance generative performance. We theoretically prove that normalized discrete Ricci flow can detect community structures by identifying distinct asymptotic behaviors in edge weights. This supports the removal of ``noisy'' document chunks characterized by large weights and negative Ricci curvature relative to the query node. Extensive experiments confirm that Ricci-Filtration outperforms several baseline reranking methods in accuracy, precision, recall, and F1 scores. Furthermore, ablation studies demonstrate that the Ricci-Filtration generally outperforms the baseline under various settings, highlighting the framework's robustness across different architectures.

2606.15454 2026-06-16 eess.AS cs.SD 新提交

Phonetically Explainable Speech Deepfake Detection

语音深度伪造检测的语音学可解释方法

Manasi Chhibber, Jagabandhu Mishra, Tomi H. Kinnunen

发表机构 * University of Eastern Finland(东方芬兰大学)

AI总结 提出音素引导的交叉注意力框架,将语音深度伪造检测转化为可解释的语音学过程,通过分解伪造后验概率实现每音素类别贡献的可视化,在多个数据集上验证了不同语音类别区分力的差异。

详情
AI中文摘要

语音深度伪造检测通常被视为一个不透明的分类任务,其中所有时间帧被平等地聚合。这忽略了不同语音类别携带的判别信息量差异巨大。为了解决这个问题,我们提出了一种音素引导的交叉注意力框架,将检测转化为一个可解释的、基于语音学的过程。我们将伪造后验概率 $P(\text{spoofed}\mid X, W)$ 分解,其中 $X$ 是声学表示,$W$ 是音素后验图。分解结果可写为 $P(\text{spoofed} \mid X, W) = \sum_{i=1}^{M} w_i \cdot P(\text{spoofed} \mid X, Z = z_i)$,其中 $M$ 表示音素类别数,$P(\text{spoofed} \mid X, Z = z_i)$ 是给定 $X$ 时第 $i$ 个音素类别 $z_i$ 的伪造概率,每个 $w_i$ 是音素类别 $z_i$ 在话语中的出现率。我们的基于Transformer的架构通过一个交叉注意力块实例化这一点,其中音素查询选择性地探测声学键和值中的信息,softmax归一化池化提供显式的音素存在权重。与先前严重依赖事后可解释性方法的工作不同,我们的框架提供了设计上的语音可解释性。我们在LJSpeech衍生语料库、ASVspoof 2019 LA和ASVspoof 5 Track 1上评估了该框架。每音素重要性排名显示,判别力集中在生成模型难以忠实再现的发音类别上。塞音、擦音、塞擦音、鼻音和静音边界闭合排名最具判别力,而周期性元音和半元音排名较低。除了有竞争力的性能外,我们的模型提供了结构可解释性,产生可检查的每发音类别最终判决分解。

英文摘要

Speech deepfake detection is predominantly treated as an opaque classification task where all temporal frames are aggregated equally. This ignores that different phonetic categories carry vastly different amounts of discriminative information. To address this, we propose a phoneme-guided cross-attention framework that transforms detection into an interpretable, phonetically grounded process. We factorize the spoofing posterior $P(\text{spoofed}\mid X, W)$, conditioned on the acoustic representation $X$ and the phonetic posteriorgram $W$. The resulting factorization can be written as $P(\text{spoofed} \mid X, W) = \sum_{i=1}^{M} w_i \cdot P(\text{spoofed} \mid X, Z = z_i)$, where $M$ denotes the number of phonetic classes, $P(\text{spoofed} \mid X, Z = z_i)$ is the spoofing probability for the $i$-th phonetic class $z_i$ conditioned on $X$, and each $w_i$ is the prevalence of phonetic class $z_i$ in the utterance. Our transformer-based architecture instantiates this through a cross-attention block in which phonetic queries selectively probe information in acoustic keys and values, with softmax-normalized pooling supplying explicit phone-presence weights. Unlike prior approaches that rely heavily on post-hoc explainability methods, our framework offers phonetic-explainability-by-design. We evaluate the framework on an LJSpeech-derived corpus, ASVspoof 2019 LA, and ASVspoof 5 Track 1. Per-phone importance rankings reveal that discriminative power concentrates on articulatory categories that generative models struggle to reproduce faithfully. Stops, fricatives, affricates, nasals, and silence-boundary closures rank most discriminative, while periodic vowels and semivowels rank lower. Beyond competitive performance, our model provides structural interpretability, yielding an inspectable per-articulatory category breakdown of the final verdict.

2606.15453 2026-06-16 cs.AR cs.LG 新提交

A Spatio-Temporal Expert Prefetching Framework for Efficient MoE-based LLM Inference

面向高效MoE大语言模型推理的时空专家预取框架

Yingnan Zhao, Razvan Bunescu, Ahmed Louri, Avinash Karanth, Ke Wang

发表机构 * George Washington University(乔治华盛顿大学) University of North Carolina at Charlotte(北卡罗来纳大学夏洛特分校) Ohio University(俄亥俄大学)

AI总结 针对MoE大模型推理中专家加载延迟问题,通过分析专家选择行为的时空相关性,提出ST-MoE框架,结合轻量级运行时预测和可重构硬件设计,实现专家预取以重叠计算与加载,提升性能与能效。

详情
AI中文摘要

基于混合专家(MoE)的大语言模型(LLM),如Qwen和DeepSeek,最近成为一种有效的方法,可以在不按比例增加计算成本的情况下提高模型容量。通过用一组专家替换密集LLM中的传统前馈网络,并为每个输入令牌仅激活其中一部分专家,MoE模型显著增加了总参数数量,同时保持每个令牌的计算相对可控。然而,这种动态且不规则的专家激活模式在推理过程中也引入了大量的专家加载开销,因为所需的专家必须根据令牌相关的路由结果按需获取。因此,专家加载延迟成为性能和能效低下的主要来源。为此,我们首先对多种基于MoE的LLM及应用(包括语言理解和代码生成)中的专家选择行为进行了全面分析。我们的分析揭示,在每个应用领域内,专家请求在相邻的MoE层和连续的解码令牌之间表现出强相关性,使得未来的专家激活可预测。基于这一洞察,我们提出了ST-MoE,一种时空专家预取框架,它主动提前准备专家,以将专家加载与正在进行的计算重叠。ST-MoE结合了一种轻量级的运行时预测机制(保持原始路由行为)和一种可重构的硬件设计(有效支持动态专家预取)。预测机制与支持硬件的结合效果显著提高了MoE推理性能和能效,同时保持了模型推理精度。

英文摘要

Mixture-of-Experts (MoE) based large language models (LLMs), such as Qwen and DeepSeek, have recently emerged as an effective approach to improving model capacity without proportionally increasing computational cost. By replacing the conventional feed-forward network in dense LLMs with a set of experts and activating only a subset of them for each input token, MoE models significantly increase the total number of parameters while keeping the per-token computation relatively manageable. However, this dynamic and irregular expert activation pattern also introduces substantial expert loading overhead during inference, since the required experts must be fetched on demand according to token-dependent routing results. As a result, expert loading latency becomes a major source of performance and energy inefficiency. To this end, we first perform a comprehensive analysis of expert selection behavior in various MoE-based LLMs and applications, including language understanding and code generation. Our analysis reveals that, within each application domain, expert requests exhibit strong correlation across both adjacent MoE layers and consecutive decoding tokens, making future expert activations predictable. Based on this insight, we propose ST-MoE, a spatio-temporal expert prefetching framework that proactively stages experts ahead of use to overlap expert loading with ongoing computation. ST-MoE combines a lightweight runtime prediction mechanism that preserves the original routing behavior with a reconfigurable hardware design that efficiently supports dynamic expert prefetching. The combined effect of the prediction mechanism with the supporting hardware significantly improves MoE inference performance and energy efficiency while preserving model inference accuracy.

2606.15444 2026-06-16 math.OC cs.LG 新提交

A Conservation Law for Equilibrium Propagation and Coupled Learning

平衡传播与耦合学习中的守恒律

Joshua A. McGinnis, Adam G. Kline, Yoichiro Mori

发表机构 * SAS, University of Pennsylvania(宾夕法尼亚大学SAS学院)

AI总结 本文证明物理学习方法耦合学习和平衡传播在连续时间小扰动极限下守恒类质量量,并分析其对线性电路训练动力学的影响。

详情
AI中文摘要

在本文中,我们展示了被称为耦合学习(CL)和平衡传播(EP)的物理学习方法在连续时间、小扰动极限下,在可训练参数中守恒一个类质量量。我们证明这种守恒在广泛的物理相关设置中成立。然后,我们展示了守恒律以某种方式约束训练动力学,使得在线性电路的重要设置中收敛可靠。最后,我们讨论了该守恒律的一些实际意义。

英文摘要

In this paper we show that the physical learning methods known as coupled learning (CL) and equilibrium propagation (EP) conserve a mass-like quantity in the trainable parameters in the continuous-time, small-nudging limit. We prove that this conservation holds in a broad range of physically relevant settings. We then show that the conservation law constrains the training dynamics in a way that makes convergence reliable in important settings for linear circuits. We conclude by discussing some practical implications of this conservation law.

2606.15443 2026-06-16 math.OC cs.LG 新提交

Coercivity and Local Convergence of Physical Learning in Linear Circuits

线性电路中物理学习的强制性与局部收敛性

Joshua A. McGinnis, Xinbo Li, Yoichiro Mori

发表机构 * Department of Mathematics, University of Pennsylvania, Philadelphia(宾夕法尼亚大学数学系)

AI总结 针对线性电路,分析三种物理学习方法(平衡传播、耦合学习及其伴随变体)在小扰动极限下的局部收敛性,发现强制条件(基于网络结构的秩条件)保证指数收敛,且非退化情况是普遍的。

详情
AI中文摘要

物理学习方法利用系统的物理特性处理全局信息传递,仅通过局部更新规则训练物理网络执行计算任务。我们首次对三种此类方法——平衡传播(EP)、耦合学习(CL)以及我们提出的新方法伴随耦合学习(AL)——在离散和连续时间的小扰动极限下,针对线性电路进行了局部收敛性分析。EP和AL在自然损失函数上执行梯度下降,而CL遵循带有额外三次修正的修正动力学。假设解存在,我们识别出一个强制条件,表示为基于网络关联结构构建的矩阵的秩条件,在该条件下训练损失指数衰减且参数收敛到解流形。我们通过展示一个风筝电路(其中对称性导致强制常数在解流形上退化)证明了强制可能失败,但利用Sard定理证明这种退化是非典型的:对于几乎每个期望输出的选择,强制条件在解流形的每一点都成立。

英文摘要

Physical learning methods train physical networks to perform computational tasks using only local update rules, exploiting the physics of the system to handle the global transfer of information. We provide the first local convergence analysis of three such methods -- Equilibrium Propagation (EP), Coupled Learning (CL), and a new method we call Adjoint Coupled Learning (AL) -- for linear circuits, in the limit of small-nudging for both discrete and continuous time. EP and AL perform gradient descent on a natural loss function, while CL follows modified dynamics with an additional cubic correction. Assuming the existence of a solution, we identify a coercivity condition, expressed as a rank condition on a matrix built from the network's incidence structure, under which the training loss decays exponentially and the parameters converge to the solution manifold. We show that coercivity can fail by exhibiting a kite circuit in which a symmetry causes the coercivity constant to degenerate on the solution manifold, but prove using Sard's theorem that such degeneracies are non-generic: coercivity holds at every point of the solution manifold for almost every choice of desired output.

2606.15442 2026-06-16 stat.ML cs.LG 新提交

The Reverse Telescoping Coordinate System for Positive Definite Matrices: Geometry, Computation, and Generative Modeling

正定矩阵的反向望远镜坐标系:几何、计算与生成建模

Anindya Bhadra

发表机构 * Purdue University(普渡大学)

AI总结 提出一种新的无约束坐标系,通过反向望远镜映射表示对称正定矩阵,实现雅可比仅依赖对数行列式、矩阵与逆矩阵的符号表示,并设计分裂体积-形状流模型用于生成建模。

详情
AI中文摘要

我们设计了一种新的无约束坐标系,其中 $p\times p$ 对称正定(SPD)矩阵 $\Theta$ 由反向望远镜映射 $\Theta(x)=\rm{RT}(x)$ 表示,其中 $x=(v,d,r)\in\mathbb{R}\times\mathbb{R}^{(p-1)}\times\mathbb{R}^{p(p-1)/2}$ 分别代表对数体积或对数行列式;以及形状,由对数相对对角尺度与节点间的部分协方差编码。这一构造产生了其他坐标图(如矩阵对数)所不具备的重要性质,例如雅可比仅依赖于对数行列式。我们构造的一个有用特性是 $x$ 包含矩阵及其逆的无损符号表示。许多涉及矩阵及其逆的重要计算可以在变换域中以 $O(p^2)$ 完成,而将结果以矩阵形式呈现(按需)才需要 $O(p^3)$ 成本。此外,变换域中两个单位行列式矩阵可以通过一条路径上单位行列式的直线连接。对于生成建模,这允许设计一个分裂体积-形状流模型,通过条件流匹配在单位行列式路径上传输形状,并有一个独立的一维流传输体积或行列式。令人生畏的SPD约束被驯服为强大的引导力,带来令人惊讶的洞察:在某种意义上,为SPD设计体积归一化的形状流比无约束的 $\mathbb{R}^{p\times p}$ 更容易,因为后者没有内在的体积概念来辅助归一化,而SPD矩阵的行列式则提供了这一点。我们将我们的构造应用于高达 $p=200$ 的SPD矩阵生成建模,针对一个困难的合成双峰目标,以及通过fMRI数据训练的模型生成脑连接网络;还应用于SPD流形上的内在扩散。

英文摘要

We design a new unconstrained coordinate system where a $p\times p$ symmetric positive definite (SPD) matrix $Θ$ is represented by a reverse telescoping map $Θ(x)=\rm{RT}(x)$, with $x=(v,d,r)\in\mathbb{R}\times\mathbb{R}^{(p-1)}\times\mathbb{R}^{p(p-1)/2}$, representing respectively the log volume or log determinant; and the shape, as encoded by log relative diagonal scales and partial covariances among the nodes. This construction results in important properties not available in other charts, e.g., matrix logarithm, such as Jacobian depending on only the log-determinant. A useful feature of our construction is $x$ contains a lossless symbolic representation of both the matrix and its inverse. Many important computations involving a matrix and its inverse can be performed in $O(p^2)$ in the transformed domain, while it is the rendering of results in matrix forms (on demand) that must incur an $O(p^3)$ cost. Moreover, two unit-determinant matrices in the transformed domain can be joined by a straight line with pathwise unit determinant. For generative modeling, this allows designing a split volume-shape flow model trained by conditional flow matching for transporting the shape over the unit-determinant path, with a separate one-dimensional flow for transporting the volume or the determinant. The forbidding SPD constraint, tamed thus into a powerful guiding force, leads to the surprising insight that it is in some sense easier to design a volume-normalized shape flow for SPD compared to the unconstrained $\mathbb{R}^{p\times p}$, with no intrinsic notion of volume to aid normalization, unlike the determinant of SPD matrices. We apply our construction for up to $p=200$ in generative modeling of SPD matrices on a difficult synthetic bimodal target, and in generating brain connectivity networks by models trained on fMRI data; as well as in intrinsic diffusion on the SPD manifold.

2606.15441 2026-06-16 cs.CR cs.AI 新提交

Defending against Adaptive Prompt Injection Attacks via Reasoning-enabled Task Alignment

通过推理启用的任务对齐防御自适应提示注入攻击

Lipeng He, Yihan Wang, Jiawen Zhang, N. Asokan

发表机构 * University of Waterloo(滑铁卢大学) Zhejiang University(浙江大学) KTH Royal Institute of Technology(皇家理工学院)

AI总结 提出RETA方法,通过基于用户任务的多目标强化学习训练防御器,利用思维链推理验证行动一致性,并采用字典学习多样性奖励生成对抗样本,在六种自适应攻击下平均攻击成功率低于4%。

详情
AI中文摘要

间接提示注入攻击通过嵌入恶意指令到第三方数据中劫持基于LLM的代理,这些数据在代理执行任务期间被检索。现有防御在静态基准上报告接近零的攻击成功率,但最近的自适应评估表明,一旦攻击者被允许针对部署的防御进行优化,这些结果就会崩溃。在这项工作中,我们将这种崩溃归因于两种失败模式。首先,现有的防御方法局限于识别特定的攻击模式,而不是评估每个嵌入指令的意图是否与用户任务相关。其次,基于训练的防御,尽管在其他方面提供了最强的安全-效用权衡,但其对抗样本是从少量手工制作的模板中组装出来的,导致防御者无法泛化到该狭窄策略分布之外。为了解决这些问题,我们提出了RETA,一种基于训练的方法,将防御决策基于用户任务而非攻击者控制的数据。在每个工具输出步骤,防御者进行思维链推理,验证其行动是否与用户任务一致。利用红队测试,模拟攻击者合成对抗训练数据,并接收字典学习多样性奖励,实现对注入重构策略的广泛覆盖。这些共同使得防御者可以通过多目标强化学习进行优化,实现更好的安全-效用权衡。在六种黑盒自适应攻击中,RETA将每次攻击的攻击成功率保持在10%以下,在两个目标模型上的平均攻击成功率分别为2.92%和3.75%,同时在攻击和干净输入下保留了大部分效用。

英文摘要

Indirect prompt injection attacks hijack LLM-based agents by embedding malicious instructions in third-party data that the agent retrieves during task execution. Existing defenses report near-zero attack success rate on static benchmarks, yet recent adaptive evaluations show that these results collapse once the attacker is allowed to optimize against the deployed defense. In this work, we trace this collapse to two failure modes. First, existing defense methods are confined to recognizing specific attack patterns, rather than assessing whether the intent of every embedded instruction is relevant to the user task. Second, training-based defenses, which otherwise offer the strongest safety-utility trade-off, assemble their adversarial examples from a handful of hand-crafted templates, and the resulting defender fails to generalize outside that narrow strategy distribution. To address these gaps, we propose RETA, a training-based method that grounds defense decisions on the user tasks rather than attacker-controlled data. At each tool-output step, the defender undertakes chain-of-thought reasoning verifying that its actions are consistent with the user task. Leveraging red-teaming, a simulated attacker synthesizes adversarial training data and receives a dictionary-learning diversity reward, achieving broad coverage of injection-reformulation strategies. Together, these allow the defender to be optimized via multi-objective reinforcement learning and achieve better safety-utility trade-off. Across six black-box adaptive attacks, RETA keeps every per-attack ASR below 10%, with average ASR of 2.92% and 3.75% on the two target models, while preserving most utility under attack and on clean inputs.

2606.15393 2026-06-16 stat.ML cs.LG stat.ME 新提交

Finite Resources False Discovery Rate Control in Structured Hypothesis Spaces

结构化假设空间中的有限资源错误发现率控制

Binyamin Perets, Shie Mannor

发表机构 * Technion – Israel Institute of Technology(技术学院 – 以色列理工学院) NVIDIA

AI总结 针对有限空分布样本和结构化假设空间,提出基于再生核的框架,通过两种决策规则在精确FDR控制与统计功效间权衡,并优化资源分配。

详情
AI中文摘要

科学发现依赖于大规模假设检验。然而,在控制错误发现的同时识别真正发现的能力面临重大挑战:获取相关参考数据(零分布)是资源密集型的,留下有限数据的不确定性,并且当假设空间存在固有结构时,程序应考虑该结构。在这里,我们提出了一个框架,用于在以下两种情况下控制错误发现率:当每个假设仅由有限数量的空分布样本支持,导致其p值不确定时;以及当假设空间具有任意结构时,仅要求通过合适的再生核表示该结构。我们提出了两种决策规则,它们对结构错误指定都具有鲁棒性,但在精确FDR控制和统计功效之间提供了不同的权衡。第一个规则保证精确的FDR控制;第二个规则通过将镜像统计控制适应到计数空间来最大化功效,利用分析框架在精确镜像对称放松时评估FDR控制。此外,RKHS框架带来的可处理性使我们能够直接研究有限数据的不确定性,我们利用这一点提出了一种有效分配零分布样本的策略。

英文摘要

Scientific discovery relies on large-scale hypothesis testing. However, the capacity to identify true discoveries while controlling false discovery faces major challenges: obtaining relevant reference data (the null distribution) is resource-intensive, leaving finite-data uncertainty, and the procedure should account for the inherent structure in the hypothesis space, when such structure exists. Here, we present a framework for controlling the false discovery rate both when each hypothesis is evidenced only by a finite count of null draws, leaving its p-value uncertain, and when the hypothesis space carries arbitrary structure, requiring only that the structure be represented through a suitable reproducing kernel. We present two decision rules that are both robust to structural mis-specification, yet offer a distinct trade-off between exact FDR control and statistical power. The first rule guarantees exact FDR control; the second maximizes power by adapting mirror-statistic control into count space, utilizing an analytical framework to assess FDR control when exact mirror symmetry is relaxed. Furthermore, the tractability gained by the RKHS framework allows us to directly investigate finite-data uncertainties, which we leverage to suggest a policy for the efficient allocation of null distribution samples.

2606.15376 2026-06-16 cs.DC cs.AI cs.MA 新提交

CoAgent: Concurrency Control for Multi-Agent Systems

CoAgent: 多智能体系统的并发控制

Hongtao Lyu, Dingyan Zhang, Mingyu Wu, Xingda Wei, Haibo Chen

发表机构 * Shanghai Jiao Tong University(上海交通大学)

AI总结 针对多智能体LLM系统并发访问共享状态时的冲突问题,提出MTPO协议,通过智能体自身判断冲突并修复计划,实现可串行化执行,在保持近串行正确率的同时提升速度。

Comments 14 pages, 7 figures. Submitted to ATC 2026

详情
AI中文摘要

多智能体LLM系统——编码智能体、运维智能体、文档智能体——现在通常并行运行多个智能体,针对同一个git树、Kubernetes集群或文档。一旦其中两个智能体修改共享状态,它们就进入了经典并发控制研究了几十年的领域,但经典机制不适合LLM智能体。单个智能体事务跨越数分钟的推理,读集广泛且不透明而非静态可推断,智能体操作的实时状态既不允许分叉也不允许缓冲,因此写操作在执行时立即生效。锁会阻塞长时间的推理间隔;OCC的终止-重试会在每次冲突时丢弃数分钟的工作。\n本文基于经典事务缺乏的能力构建并发控制:每个智能体内的LLM可以判断冲突写入是否使其计划无效,并精确修复依赖于该写入的操作。因此控制变为建议性的:运行时通知,智能体修复。我们的协议MTPO(单调轨迹预排序)在启动时固定一个序列化顺序,为每次读取提供按顺序过滤的值,并原地推测性地应用写入;单向通知要求受影响的读取者重新判断并修补其计划,同时框架通过每个工具预先注册的saga式逆操作机械地撤销和重新排序错位的写入。在静止时,运行按预定顺序可串行化。我们将MTPO实现为CoAgent,一种工具调用中间件,其特权ToolSmith在线增长具有声明足迹和可撤销的工具。在十个有冲突的工作负载上,CoAgent在1.4倍加速和近串行令牌成本下保持5%以内的串行正确性,而2PL和OCC几乎放弃了所有并发增益;在纯bash目标系统上,它在线增长了一个25工具库,并将任务通过率从45/71提升到63/71,时间和成本分别为0.80倍和0.86倍。

英文摘要

Multi-agent LLM systems -- coding agents, devops agents, document agents -- now routinely run several agents in parallel against the same git tree, Kubernetes cluster, or document. As soon as two of them mutate shared state, they enter the regime classical concurrency control has studied for decades, but classical mechanisms fit LLM agents poorly. A single agent transaction spans minutes of inference, read sets are broad and opaque rather than statically inferable, and the live state agents act on admits neither fork nor buffer, so writes take effect the moment they execute. Locks block long inference intervals; OCC abort-and-retry discards minutes of work on every conflict. This paper builds concurrency control on a capability classical transactions lack: the LLM inside each agent can judge whether a conflicting write invalidates its plan, and can repair exactly the operations that depended on it. Control therefore turns advisory: the runtime informs, the agent repairs. Our protocol, MTPO (Monotonic Trajectory Pre-Order), fixes a serialization order at launch, serves each read the order-filtered value, and applies writes speculatively in place; a one-way notification asks an affected reader to re-judge and patch its plan, while the framework mechanically undoes and reorders misplaced writes through the saga-style inverse each tool registers in advance. At quiescence the run is serializable in the pre-decided order. We realize MTPO as CoAgent, toolcall middleware whose privileged ToolSmith grows footprint-declared, undoable tools online. On ten contended workloads, CoAgent stays within 5\% of serial correctness at a $1.4\times$ speedup and near-serial token cost, where 2PL and OCC surrender nearly all concurrency gains; on a bash-only target system, it grows a 25-tool library online and lifts the task pass rate from 45/71 to 63/71 at $0.80\times$ the time and $0.86\times$ the cost.

2606.15366 2026-06-16 eess.SY cs.RO cs.SY math.OC 新提交

Robust Conformal CBF and CLF Controllers via Iterative Policy Updates

通过迭代策略更新的鲁棒共形CBF和CLF控制器

Omid Mirzaeedodangeh, Eliot Shekhtman, Nikolai Matni, Lars Lindemann

发表机构 * Automatic Control Laboratory, ETH Zürich(瑞士苏黎世联邦理工学院自动控制实验室) Computer and Information Science, University of Pennsylvania(宾夕法尼亚大学计算机与信息科学系) Electrical and Systems Engineering, University of Pennsylvania(宾夕法尼亚大学电气与系统工程系)

AI总结 针对共形预测嵌入鲁棒控制时因分布偏移导致安全/稳定性保证失效的问题,提出迭代更新策略框架,结合对抗鲁棒共形预测与分布偏移预算,实现跨回合保证。

详情
AI中文摘要

共形预测(CP)已被用于获取学习动力学模型与真实未知系统之间误差的概率界限。然后,这些CP界限可以嵌入到鲁棒控制李雅普诺夫函数(CLF)和控制屏障函数(CBF)框架中。然而,由于部署的CLF/CBF策略下的闭环轨迹分布与推导CP界限及其保证的轨迹分布之间存在分布偏移,这种方法无法保留稳定性/安全性保证。为了解决这个问题,我们提出了一种情节式框架,该框架迭代更新鲁棒共形CLF/CBF策略,同时保持跨情节的稳定性/安全性保证。我们通过(1)使用对抗鲁棒共形预测,以及(2)量化分布偏移预算来实现这一点,该预算允许我们控制模型误差在策略更新中增加的程度。该分布偏移预算通过闭环轨迹灵敏度分析推导得出,为CP界限提供了隐式和显式更新规则。我们分析了算法的收敛性,并在三个案例研究中进行了演示。据我们所知,这是首次为鲁棒共形CBF/CLF策略提供稳定性/安全性保证的结果。

英文摘要

Conformal prediction (CP) has been used to obtain probabilistic bounds on the error between a learned dynamics model and the true but unknown system. Such CP bounds can then be embedded into robust control Lyapunov function (CLF) and control barrier function (CBF) frameworks. However, such an approach does not retain stability/safety guarantees because of the distribution shift between the closed-loop trajectory distribution under the deployed CLF/CBF policy and the trajectory distribution from which the CP bound and its guarantees were derived. To address this issue, we propose an episodic framework that iteratively updates the robust conformal CLF/CBF policy while maintaining stability/safety guarantees across episodes. We achieve this by (1) using adversarially robust conformal prediction, and (2) quantifying a distribution shift budget that allows us to control how much the model error can increase across policy updates. This distribution shift budget is derived via a closed-loop trajectory sensitivity analysis, yielding an implicit and an explicit update rule for the CP bound. We analyze convergence of our algorithm, which we demonstrate on three case studies. To the best of our knowledge, these are the first results that provide stability/safety guarantees for robust conformal CBF/CLF policies.

2606.15358 2026-06-16 cs.HC cs.AI 新提交

Cognitive Trajectory Modeling: Quantifying Human-AI Co-Creation through Cognitively Grounded Interaction Trajectories

认知轨迹建模:通过基于认知的交互轨迹量化人机共创

Nicholas Davis

发表机构 * Co-Creative AI Consulting(协同AI咨询)

AI总结 提出认知轨迹建模(CTM)理论,通过认知轨迹和吸引子景观量化人机共创中的交互动态,区分认知轨迹与交互痕迹,为研究共创AI和人类-AI交互提供框架。

详情
AI中文摘要

共创AI研究日益寻求能够表征交互动态随时间演变的方法。虽然许多现有方法关注可观察的交互特征、交互度量、行为编码方案或活动痕迹,但这些方法往往难以捕捉高阶交互动态,包括协作过程如何随时间重组、稳定、调节和演变。本文引入认知轨迹建模(CTM)作为交互动态的认知理论,将认知、交互和创造过程概念化为在具有认知意义的吸引子景观中展开的时间组织轨迹。CTM建立在创造力生成模型和创造性意义建构(CSM)的理论基础上,重新审视意义建构曲线和认知轨迹在表征共创交互动态中的作用。我们通过认知轨迹原理形式化这一视角,该原理指出,只有当时间表示的基础状态具有方向性认知意义时,它们才在理论上可解释为认知轨迹。基于此原理,CTM将认知轨迹的概念推广到任何特定编码方案之外,并提供了一个更广泛的框架,用于通过在有意义的吸引子景观中展开的轨迹来建模交互动态。我们进一步区分认知轨迹与交互痕迹,并将CTM置于更广泛的认知、交互和领域动态层次结构中。更广泛地说,我们认为理解共创系统需要能够建模认知和交互动态随时间演变的方法。CTM为研究共创AI和人机交互中的交互动态提供了基础。

英文摘要

Co-creative AI research increasingly seeks methods capable of representing how interaction dynamics evolve through time. While many existing approaches focus on observable interaction characteristics, interaction metrics, behavioral coding schemes, or activity traces, these methods often struggle to capture higher-order interaction dynamics, including how collaborative processes reorganize, stabilize, regulate, and evolve through time. This paper introduces Cognitive Trajectory Modeling (CTM) as a cognitive theory of interaction dynamics that conceptualizes cognition, interaction, and creative processes as temporally organized trajectories unfolding across cognitively meaningful attractor landscapes. CTM builds upon the theoretical foundations of the Enactive Model of Creativity and Creative Sense-Making (CSM), revisiting the role of sense-making curves and cognitive trajectories in representing co-creative interaction dynamics. We formalize this perspective through the Cognitive Trajectory Principle, which states that temporal representations are only theoretically interpretable as cognitive trajectories when their underlying states possess directional cognitive meaning. Building on this principle, CTM generalizes the notion of cognitive trajectories beyond any particular coding scheme and provides a broader framework for modeling interaction dynamics through trajectories unfolding across meaningful attractor landscapes. We further distinguish cognitive trajectories from interaction traces and situate CTM within a broader hierarchy of cognitive, interaction, and domain dynamics. More broadly, we argue that understanding co-creative systems requires methods capable of modeling how cognition and interaction dynamics unfold through time. CTM provides a foundation for studying interaction dynamics across co-creative AI and human-AI interaction.

2606.15356 2026-06-16 physics.flu-dyn cs.LG 新提交

ShipNet: A Geometric Deep Learning Surrogate for Real-Time Ship Hydrodynamics

ShipNet:一种用于实时船舶水动力学的几何深度学习代理模型

Kirsten Odendaal, George Drakoulas

发表机构 * Maritime Research Institute(海洋研究机构) Wageningen, Netherlands(荷兰瓦格宁根) Damen Research(达门研究) Gorinchem, Netherlands(荷兰戈林切姆)

AI总结 提出ShipNet几何深度学习代理模型,直接从船体几何和速度预测压力分布与波浪场,在保留测试集上R²达0.98和0.91,推理速度比势流求解器快550倍以上。

详情
AI中文摘要

准确预测水动力性能是船舶设计的核心,然而高保真计算流体动力学在大规模参数探索中仍然过于昂贵。这促使开发数据驱动的代理模型,以显著降低的成本提供对水动力预测的快速近似。我们提出ShipNet,一种几何深度学习代理模型,直接从船体几何和速度预测船体表面压力分布和远场自由表面波模式。该网络在船体点云上采用正则化动态图卷积主干,并使用多头解码器同时输出近体压力和自由表面高程。训练数据包括使用势流面板法对两种母型游艇船体生成的420次无粘自由表面模拟,每种船体参数化为70种变体并在三种速度下评估。ShipNet使用结合逐点回归和图像结构项的复合损失预测每点压力系数和二维波浪高程图。在几何保留测试集上,ShipNet对船体压力达到R²=0.98,对波浪场达到R²=0.91。每个案例推理约需0.15秒,在传统硬件上相比势流求解器实现超过550倍的加速。局限性包括受限的几何和速度范围以及无粘训练数据,未来工作将通过物理信息正则化将模型扩展到高保真粘性模拟。

英文摘要

Accurate prediction of hydrodynamic performance is central to ship design, yet high-fidelity computational fluid dynamics remains prohibitively expensive for large-scale parametric exploration. This motivates the development of data-driven surrogate models that provide rapid approximations to hydrodynamic predictions at substantially reduced cost. We present ShipNet, a geometric deep-learning surrogate that predicts both hull-surface pressure distributions and far-field free-surface wave patterns directly from hull geometry and speed. The network employs a regularized dynamic graph convolutional backbone on hull point clouds, with a multi-head decoder for simultaneous near-body pressure and free-surface elevation outputs. Training data consist of 420 inviscid free-surface simulations generated using a potential-flow panel method for two parent yacht hulls, each parameterized into 70 variants and evaluated at three speeds. ShipNet predicts per-point pressure coefficient and two-dimensional wave elevation map using a composite loss that combines point-wise regression and image-structure terms. On a geometry-held-out test set, ShipNet achieves R^2=0.98 for hull pressure and R^2=0.91 for wave fields. Inference requires approximately 0.15s per case, yielding over a 550x speedup relative to the potential-flow solver on conventional hardware. Limitations include the restricted geometry and speed ranges and the inviscid training data, while future work will extend the model to high-fidelity viscous simulations with physics-informed regularization.

2606.15352 2026-06-16 eess.IV cs.CV cs.GR 新提交

Chroma-gated, differentiable OKLCH interpolation: Continuous Oklab fallback for color-cast reduction

色度门控、可微分的OKLCH插值:用于减少色偏的连续Oklab回退

Naoyuki Uchida

发表机构 * Independent Researcher(独立研究者)

AI总结 针对OKLCH插值在中性轴附近的两种色偏问题,提出一种可微分的色度门控函数,连续混合OKLCH和线性Oklab路径,在不依赖端点测试的情况下统一处理两种色偏,并验证了其有效性。

Comments 14 pages, 5 figures. Ancillary files: reproducibility scripts (symbolic verification, evaluation, and figure generation)

详情
AI中文摘要

OKLCH——Ottosson的Oklab颜色空间的圆柱形式(亮度、色度、色调)——是CSS Color 4推荐的用于渐变和color-mix()的插值空间,现已广泛部署。然而,其极坐标参数化在中性轴附近以两种方式产生色偏:(1)两个彩色端点之间的色调间绕行,经过非预期的色调(蓝色到黄色明显经过绿色);(2)当一个端点为消色差时,产生离线弯曲。现有补救措施统一为二值化——仅在消色差端点触发的阈值开关——因此它们仅处理(2);对于彩色对,它们都退化为原始OKLCH,未处理(1)色调间色偏。我们引入连续Oklab回退(COFb),一个单参数、可微分的色度门控$w(C)=C^n/(C^n+σ^n)$,随着色度下降,将OKLCH路径连续混合到线性Oklab路径。单个门控减少了二值化家族未处理的(1)色偏,并统一处理(1)和(2),无需任何端点测试。我们刻画了一个色偏-色调权衡边界,采用默认值($n=1$,有理Michaelis-Menten形式;对于典型sRGB调色板,$σ\approx0.19$,基于归一化无关的半色偏准则),并符号验证了门控的性质。在默认值下,COFb将色调间路径绕行减半(平均横向偏差-49.5%,色度加权色调偏移-35.5%)。我们还说明了该方法的局限性:仅针对(2),二值化开关仍然更好,并且像任何笛卡尔混合一样,COFb不保持色度。在部署中,COFb完全在普通Oklab (a,b)到sRGB中运行,因此它作为一种回退,在无法使用现代CSS颜色插值(color-mix(in oklch)等)的场合——旧引擎、图像和视频管线或GPU着色器——提供相同的减少色偏的渐变。

英文摘要

OKLCH -- the cylindrical (lightness, chroma, hue) form of Ottosson's Oklab color space -- is the interpolation space recommended by CSS Color 4 for gradients and color-mix(), and it is now broadly deployed. Its polar parameterization, however, casts color near the neutral axis in two ways: (1) an inter-hue detour between two chromatic endpoints that sweeps through an unintended hue (blue to yellow visibly passing through green), and (2) an off-line bow when one endpoint is achromatic. Existing remedies are uniformly two-valued -- a threshold switch that fires only at an achromatic endpoint -- so they address only (2); on chromatic pairs every one of them reduces to raw OKLCH, leaving the (1) inter-hue cast untreated. We introduce Continuous Oklab fallback (COFb), a one-parameter, differentiable chroma gate $w(C)=C^n/(C^n+σ^n)$ that continuously blends the OKLCH path toward the linear Oklab path as chroma falls. A single gate reduces the (1) cast that the two-valued family leaves untreated and unifies the handling of (1) and (2) without any endpoint test. We characterize a cast-hue trade-off frontier, adopt a default ($n=1$, the rational Michaelis-Menten form; $σ\approx0.19$ for a typical sRGB palette, from a normalization-independent cast-half criterion), and verify the gate's properties symbolically. At the default, COFb halves the inter-hue path detour (mean lateral deviation -49.5%, chroma-weighted hue excursion -35.5%). We also state the method's limits: on (2) alone the two-valued switch remains better, and like any Cartesian blend COFb does not preserve chroma. In deployment, COFb runs entirely in plain Oklab (a,b) to sRGB, so it serves as a fallback that delivers the same cast-reduced gradients where modern CSS color interpolation (color-mix(in oklch) and the like) is unavailable -- older engines, image and video pipelines, or GPU shaders.

2606.15348 2026-06-16 q-bio.NC cs.AI 新提交

Intrinsic Computational Functionalism and Simulated Consciousness

内在计算功能主义与模拟意识

Ryota Kanai, Shuqin Ma

发表机构 * Araya Inc.(Araya公司) School of Philosophy, Fudan University(复旦大学哲学学院) Sussex Centre for Consciousness Science, University of Sussex(Sussex大学意识科学中心)

AI总结 本文从内在计算功能主义出发,提出机制丰富的规范结构,论证若意识是计算构成的,则任何满足内在因果-计算实现关系的系统(生物、人工或模拟)都实现相同的意识相关属性。

详情
AI中文摘要

对人工或模拟意识的一个常见反对意见是,模拟的大脑并不比模拟的水更湿。我们从内在计算功能主义(ICF)的角度来回应:如果意识是由计算构成的,那么它不依赖于外部强加的描述,而是依赖于系统凭借其自身的因果-动力学组织所物理实现的计算结构。在之前的工作中,我们将规范功能主义发展为此反解释主义纲领的一个数学精确的特例,通过固定接口下的完整未来输入-输出角色来识别功能状态。这里我们论证,这种输入-输出构造虽然重要,但并不完整:作为ICF的一个行为边界情况,它使得查找表和展开的系统在规范上等价,只要它们保持相同的边界行为。一个与意识相关的规范表示必须转而包含属于相关内在组织的内部机制、干预和联合读出。因此,我们定义了一个机制丰富的规范结构,并用它来制定内在因果-计算实现(ICCR),这是一种保持物理实现、内在状态个体化、转移结构、干预轮廓以及相关主体-身体-世界边界的实现关系。核心结果是条件性的:如果意识属性是内在因果-计算组织的不变量,那么任何满足ICCR的系统都实现相同的意识相关属性,无论是生物的、人工的还是模拟的。我们讨论了包括生物自然主义和整合信息理论在内的反对意见。我们得出结论,要否认模拟具有意识,必须识别出模拟未能实现的与意识相关的内在因果-计算结构。

英文摘要

A common objection to artificial or simulated consciousness is that a simulated brain is no more conscious than simulated water is wet. We address this from the perspective of Intrinsic Computational Functionalism (ICF): if consciousness is computationally constituted, it depends not on externally imposed descriptions but on the computational structures a system physically realizes in virtue of its own causal-dynamical organization. In previous work we developed Canonical Functionalism as a mathematically precise special case of this anti-interpretivist program, identifying functional states by their complete future input-output roles under a fixed interface. Here we argue that this input-output construction, though important, is incomplete: as a behavioral boundary case of ICF, it makes lookup tables and unfolded systems that preserve the same boundary behavior canonically equivalent. A consciousness-relevant canonical representation must instead include internal mechanisms, interventions, and joint readouts belonging to the relevant intrinsic organization. We therefore define a mechanism-enriched canonical structure and use it to formulate Intrinsic Causal-Computational Realization (ICCR), a realization relation preserving physical implementation, intrinsic state individuation, transition structure, intervention profiles, and the relevant agent-body-world boundary. The central result is conditional: if conscious properties are invariants of intrinsic causal-computational organization, then any system satisfying ICCR realizes the same consciousness-relevant properties, whether biological, artificial, or simulated. We discuss objections including biological naturalism and integrated information theory. We conclude that to deny consciousness to a simulation, one must identify a consciousness-relevant intrinsic causal-computational structure that the simulation fails to realize.

2606.15344 2026-06-16 cond-mat.dis-nn cs.LG physics.optics quant-ph 新提交

Generative modelling powered by room-temperature polariton condensates

基于室温极化激元凝聚的生成建模

Yuan Wang, Marcin Muszynski, Avinash Dash, Rishabh Kaurav, Vinod M. Menon, Oleksandr Kyriienko

发表机构 * School of Mathematical and Physical Sciences, University of Sheffield, Sheffield S10 2TN, United Kingdom(谢菲尔德大学数学与物理科学学院) Department of Physics, City College of New York, New York, NY 10031, USA(纽约城市学院物理系) Physics Doctoral Program, Graduate Center of the City University of New York, New York, NY 10016, USA(纽约城市大学研究生中心物理博士项目) Chemistry Doctoral Program, Graduate Center of the City University of New York, New York, NY 10016, USA(纽约城市大学研究生中心化学博士项目)

AI总结 利用有机染料微腔中室温激子-极化激元凝聚体的非线性多体动力学和固有随机性,作为生成对抗网络中的物理随机变换层,实现条件数字到图像翻译,优于数字注入扰动方法。

Comments 9 pages and 4 figures in the main text; 17 pages SM; codes to be released

详情
AI中文摘要

生成建模需要高效的随机非线性变换以及能够自然实现这些变换的物理平台。我们实验证明,工作在强光-物质耦合机制下的非线性光学系统可以作为条件生成建模的物理变换层。具体而言,我们开发了一个工作流程,其中在有机染料微腔中形成的室温激子-极化激元凝聚体作为生成对抗网络中的物理随机变换,实现条件数字到图像翻译。通过利用极化激元凝聚体的非线性多体动力学和固有随机性,该工作流程优于基于数字注入扰动的基线方法。我们发现,与数字采样和基于激光的系统相比,通过生成对抗网络(Polariton GAN)进行的极化激元增强采样提高了初始分数、数字保留精度和结构相似性。我们进一步表明,空间相关的输出变化可以自然地正则化对抗训练并增强输出多样性。我们的结果确立了极化激元凝聚作为生成建模的新计算资源,为物理增强机器学习系统开辟了道路。

英文摘要

Generative modelling requires efficient stochastic nonlinear transformations and physical platforms that can naturally realise them. We experimentally demonstrate that nonlinear optical systems operating in the strong light-matter coupling regime can serve as physical transformation layers for conditional generative modelling. Specifically, we develop a workflow in which room-temperature exciton-polariton condensates formed in organic dye microcavities act as a physical stochastic transform within a generative adversarial network and enable conditional digit-to-image translation. By using the nonlinear many-body dynamics and intrinsic stochasticity of polariton condensates, the workflow outperforms baseline approaches based on digitally injected perturbations. We find that polariton-enabled sampling via generative adversarial network (Polariton GAN) yields improved inception score, digit preservation accuracy and structural similarity compared with both digital sampling and laser-based systems. We further show that spatially correlated output variations can naturally regularise adversarial training and enhance output diversity. Our results establish polariton condensation as a new computational resource for generative modelling, opening a pathway towards physics-enhanced machine learning systems.

2606.15331 2026-06-16 cs.IR cs.AI 新提交

HoloRec: Holistic Encoding and Interleaved Reasoning for Generative Recommendation

HoloRec:面向生成式推荐的整体编码与交错推理

Shuqi Zhao, Jingsong Su, Xiang Liu, Xingzhi Yao, Yiming Qiu, Huimu Wang, Liang Lin, Pengbo Mo, Mingming Li, Jiao Dai, Jizhong Han, Songlin Hu

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(信息工程研究所,中国科学院) School of Artificial Intelligence, Beijing Normal University(北京师范大学人工智能学院) JD.com(京东公司)

AI总结 提出HoloRec,通过多粒度嵌套残差量化构建层次语义编码矩阵,实现内生的思维链推理,无需外部标注,在稀疏场景下显著提升推荐准确率。

详情
AI中文摘要

将任务建模为序列生成的生成式推荐模型克服了传统级联架构的目标碎片化问题,但现有方法仍存在缺乏层次结构用于多步推理的扁平语义表示,以及需要昂贵标注且与生成目标脱节的外部构建思维链(CoT)等问题。我们提出HoloRec,一种内生的思维链推荐机制,通过多粒度嵌套残差量化构建层次语义编码矩阵,并由整体重建损失优化,统一了表示、推理和生成。HoloRec支持两种推理模式:非思考模式使用轻量级多粒度监督对齐进行快速预测,思考模式采用交错推理方案动态生成CoT步骤,将推理直接嵌入生成过程,无需外部数据。在多个公开推荐数据集上的实验表明,HoloRec持续优于基线,在稀疏场景下尤其显著,且思考模式在仅增加适度推理开销的情况下实现了比非思考模式更高的准确率。

英文摘要

Generative recommendation models that formulate the task as sequence generation overcome the objective fragmentation problem of traditional cascade architectures, yet existing approaches still suffer from flat semantic representations lacking hierarchical structure for multi-step reasoning and an externally constructed chain-of-thought (CoT) that requires expensive annotations and remains disconnected from the generation objective. We propose HoloRec, an endogenous chain-of-thought recommendation mechanism that unifies representation, reasoning, and generation by constructing a hierarchical semantic encoding matrix via multi-granularity nested residual quantization optimized by a holistic reconstruction loss. HoloRec supports two inference modes: a non-thinking mode that uses lightweight multi-granularity supervised alignment for fast prediction, and a thinking mode that employs an interleaved reasoning scheme to generate CoT steps on the fly, directly embedding reasoning into the generation process without external data. Experiments on multiple public recommendation datasets demonstrate that HoloRec consistently outperforms baselines, with especially significant gains in sparse scenarios, and the thinking mode achieves better accuracy than the non-thinking mode with only modest inference overhead.

2606.15313 2026-06-16 eess.AS cs.SD 新提交

DDPO-VC: Speaker De-Identification via Diffusion Denoising Policy Optimization

DDPO-VC:基于扩散去噪策略优化的说话人去识别

Liming Wang, Cody Karjadi, Rhoda Au, James Glass

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出DDPO-VC框架,通过强化学习后训练扩散模型,利用隐私与效用教师奖励信号,在说话人去识别中平衡隐私保护与下游任务效用,优于多种强基线方法。

详情
AI中文摘要

说话人去识别的一个关键挑战是隐私与效用之间的平衡。许多效用变量,如说话人的认知健康状况,与隐私变量(如说话人身份)相关,违反了基于解耦方法所持有的独立性假设,导致私人信息泄露和下游任务有用信息丢失。为应对这一挑战,我们提出了一个通用框架DDPO-VC,通过基于强化学习的后训练与扩散模型实现说话人去识别。结合来自隐私导向和效用导向教师的奖励信号进行学习,我们的方法在两个常用的痴呆症语音基准上,在隐私保护和认知效用方面均优于各种强去识别方法。请查看我们的代码和演示。

英文摘要

A key challenge of speaker de-identification is the balance between privacy and utility. Many utility variables, such as the cognitive health status of the speaker, are correlated with the privacy variable, such as the speaker identity, violating the independence assumption held by the disentanglement-based approaches, causing leakage of private information and the loss of useful information for downstream tasks. To tackle this challenge, we propose a general framework, DDPO-VC, for speaker de-identification through reinforcement learning-based post-training with diffusion models. Learning from reward signals combining knowledge from privacy-focused and utility-focused teachers, our method outperforms various strong \deid/ methods in both privacy preservation and cognitive utility on two commonly used dementia speech benchmarks. Please check out our code\footnote{\href{https://github.com/cactuswiththoughts/DDPO-VC}{https://github.com/cactuswiththoughts/DDPO-VC}} and demo\footnote{\href{https://cactuswiththoughts.github.io/SpeakerDeID-Demo/}{https://cactuswiththoughts.github.io/SpeakerDeID-Demo/}}.

2606.15311 2026-06-16 eess.SY cs.RO cs.SY 新提交

Hamilton-Jacobi Reachability-Based Safe Reinforcement Learning for Emergency Collision Avoidance

基于Hamilton-Jacobi可达性的安全强化学习用于紧急碰撞避免

Yuhong Jiang, Shiyue Zhao, Junzhi Zhang, Junfeng Zhang, Xinhan Li, Shijie Zhao, Chengkun He

发表机构 * Tsinghua University(清华大学) Jilin University(吉林大学)

AI总结 提出一种基于Hamilton-Jacobi可达性运动安全集的安全强化学习框架,通过离线数据近似安全集并嵌入约束马尔可夫决策过程,实现紧急避撞中的前瞻性安全监督与策略优化。

Comments Preprint

详情
AI中文摘要

极端驾驶条件下的紧急碰撞避免需要安全关键控制,该控制需考虑未来时间范围内的障碍物接近度和车辆动态稳定性,然而现有方法通常依赖瞬时或局部安全评估。本文提出一种安全强化学习框架,由基于Hamilton-Jacobi (HJ) 可达性的运动安全集引导,为约束策略优化提供前瞻性安全监督。具体而言,通过结合几何碰撞裕度和底盘稳定性极限,构建统一的符号安全函数,并通过可达性分析将其扩展为有限时域运动安全集,该集合表征在未来车辆状态演化下能否维持安全。为实现实用计算,从离线极端驾驶数据中近似运动安全集,减轻基于网格的HJ求解器的计算负担。然后将学习到的运动安全集作为连续安全成本嵌入约束马尔可夫决策过程,并采用PID-Lagrangian策略优化方案自适应调节拉格朗日乘子以强制执行安全约束。在低附着避障场景中的仿真和实车实验表明,所提方法相比基线方法实现了更高的目标到达率、更平滑的避让机动,并保持了更大的统一安全裕度。

英文摘要

Emergency collision avoidance under extreme driving conditions demands safety-critical control that accounts for both obstacle proximity and vehicle dynamic stability over a future time horizon, yet existing methods often rely on instantaneous or local safety evaluations. This paper proposes a safe reinforcement learning framework guided by a Hamilton-Jacobi (HJ) reachability based motion safety set that provides forward-looking safety supervision for constrained policy optimization. Specifically, a unified signed safety function is formulated by combining geometric collision margins and chassis stability limits, and is then extended through reachability analysis into a finite-horizon motion safety set that characterizes whether safety can be maintained under future vehicle state evolution. To enable practical computation, the motion safety set is approximated from offline extreme driving data, mitigating the computational burden of grid-based HJ solvers. The learned motion safety set is then embedded as a continuous safety cost into a constrained Markov decision process, and a PID-Lagrangian policy optimization scheme is employed to adaptively regulate the Lagrange multiplier for safety constraint enforcement. Simulation and real-vehicle experiments on low-adhesion obstacle-avoidance scenarios demonstrate that the proposed method achieves higher goal-reaching rates, produces smoother avoidance maneuvers, and maintains larger unified safety margins than baseline methods.

2606.15284 2026-06-16 eess.SP cs.AI cs.LG 新提交

CAP: Towards PPG Universal Representation Learning with Patient-level Supervision

CAP:面向患者级监督的PPG通用表示学习

Chenyang He, Xinyi Shao, Shun Huang, Bosong Huang, Daoqiang Zhang, Ming Jing, Cheng Ding

发表机构 * Nanjing University of Aeronautics and Astronautics(南京航空航天大学) Peking University(北京大学) Independent Researcher(独立研究者) Jinling Clinical Medical College College of Artificial Intelligence Nanjing University of Aeronautics and Astronautics(金陵临床医学院人工智能学院南京航空航天大学)

AI总结 提出CAP方法,通过构建大规模PPG-EHR多模态数据集和跨模态对比对齐,学习患者级临床语义的PPG表示,在四项下游任务中平均提升26.7%,呼吸率预测提升87.6%。

Comments Accepted as an Oral presentation at KDD 2026

详情
AI中文摘要

光电容积描记法(PPG)在可穿戴健康监测和临床决策支持中发挥着核心作用。然而,现有的通用PPG表示学习方法主要关注信号级目标,往往忽略患者级健康背景,这限制了对复杂临床任务和异质性队列的泛化能力。为解决这一问题,我们通过将碎片化的病史和临床记录整合为连贯的患者级电子健康记录(EHR),构建了一个大规模配对PPG-EHR多模态数据集。基于此资源,我们提出了临床锚定预训练方法(CAP)。在预训练期间,CAP执行跨模态对比对齐,将PPG表示锚定到患者级临床语义,引导编码器超越波形拟合,建模患者整体生理状态的一致性。在下游适应期间,预训练的PPG编码器提供临床基础的表示,增强归纳偏置,提高鲁棒性和可迁移性。实验表明,CAP在四个不同的下游任务上持续优于强基线。CAP在呼吸率预测上取得了特别大的提升(相比最先进基线相对提升高达87.6%),并在所有任务上平均相对提升26.7%。我们通过全面分析(包括消融实验和多个互补的可视化学习表示)进一步增强了方法的可解释性。实验代码可在 https://github.com/gody123gody/CAP 获取。

英文摘要

Photoplethysmography (PPG) plays a central role in wearable health monitoring and clinical decision support. Yet existing approaches to universal PPG representation learning largely focus on signal-level objectives and often overlook patient-level health context, which limits generalization to complex clinical tasks and heterogeneous cohorts. To address this gap, we construct a large-scale paired PPG-EHR multimodal dataset by distilling fragmented medical histories and clinical records into cohesive, patient-level electronic health records (EHR). Building on this resource, we propose Clinical Anchored Pretraining for PPG (CAP). During pretraining, CAP performs cross-modal contrastive alignment that anchors PPG representations to patient-level clinical semantics, guiding the encoder beyond waveform fitting toward modeling consistency in a patient's overall physiological state. During downstream adaptation, the pretrained PPG encoder provides clinically grounded representations that strengthen inductive bias and improve robustness and transferability. Experiments demonstrate that CAP consistently outperforms strong baselines on four diverse downstream tasks. CAP achieves a particularly large gain on respiratory rate prediction (up to +87.6% relative improvement over the state-of-the-art baseline) and delivers an average relative +26.7% across all tasks. We further enhance the interpretability of our approach through comprehensive analyses, including ablations and multiple complementary visualizations of the learned representations. The code for our experiments is available at: https://github.com/gody123gody/CAP .

2606.15277 2026-06-16 cs.IR cs.AI cs.DB cs.ET cs.LG 新提交

Guiding Federated Graph Recommendation with LLM-encoded knowledge

利用LLM编码知识指导联邦图推荐

Thi Minh Chau Nguyen, Hien Trang Nguyen, Duc Anh Nguyen, Van Ho-Long, Thanh Trung Huynh, Zhao Ren

发表机构 * institutetext(机构)

AI总结 针对联邦图推荐中跨客户端图表示对齐难的问题,提出利用大语言模型编码的语义信号指导结构表示的选择性聚合,提升推荐准确性。

Comments Technical Report

详情
AI中文摘要

基于图的推荐系统在从用户-物品交互中提取协同信号方面非常有效,联邦学习(FL)则可以在保护用户隐私的同时训练这些模型。然而,跨分布式、非独立同分布(non-IID)客户端聚合图表示仍然是一个挑战;局部学习的结构嵌入常常不对齐,简单的平均无法捕捉有意义的跨客户端关系。大多数现有的联邦图方法仅依赖结构聚合,忽略了大型语言模型(LLM)中丰富的全局语义上下文。在本文中,我们提出了一种新颖的框架,利用LLM编码的知识来指导联邦图推荐。具体来说,客户端从局部图中学习结构表示,同时通过冻结的LLM将其典型交互模式总结为紧凑的语义向量。中央服务器随后利用这些LLM编码的语义信号发现跨客户端的相关偏好模式,指导其结构表示的选择性聚合。这实现了语义感知的跨客户端协作,而无需暴露原始数据。在标准基准上的大量实验表明,利用LLM编码知识指导结构对齐一致地提高了现有联邦图基线的推荐准确性。

英文摘要

Graph-based recommender systems are highly effective at extracting collaborative signals from user--item interactions, and federated learning (FL) allows these models to be trained while preserving user privacy. However, aggregating graph representations across distributed, non-IID clients remains a challenge; structural embeddings learned locally often misalign, and naive averaging fails to capture meaningful cross-client relationships. Most existing federated graph methods rely exclusively on structural aggregation, neglecting the rich, global semantic context available in large language models (LLMs). In this paper, we propose a novel framework that uses LLM-encoded knowledge to guide federated graph recommendation. Specifically, clients learn structural representations from local graphs while simultaneously summarizing their typical interaction patterns into compact semantic vectors via a frozen LLM. The central server then uses these LLM-encoded semantic signals to discover related preference patterns across clients, guiding the selective aggregation of their structural representations. This enables semantically informed cross-client collaboration without exposing raw data. Extensive experiments on standard benchmarks show that guiding structural alignment with LLM-encoded knowledge consistently improves recommendation accuracy over existing federated graph baselines.

2606.15271 2026-06-16 math.OC cs.LG 新提交

Dual-Network PINNs for Optimal Control: A Reproducible Benchmark on the Mass-Spring-Damper System

双网络PINNs用于最优控制:质量-弹簧-阻尼器系统的可复现基准

Abdeladhim Tahimi, Rinaldo Vieira da Silva Junior

发表机构 * Centro de Engenharias e Ciências Agrárias, Universidade Federal de Alagoas, Brazil(工程与农业科学系,巴西联邦大学阿拉加斯分校)

AI总结 提出双网络物理信息神经网络(PINN)直接求解质量-弹簧-阻尼器系统最优控制问题,通过状态网络精确满足边界条件、控制网络无约束,损失函数结合物理残差和成本泛函梯形近似,在基准上复现经典最优成本至四位有效数字。

Comments 22 pages, 6 figures. Reproducible benchmark study of dual-network Physics-Informed Neural Networks (PINNs) for optimal control of a mass-spring-damper system. Includes comparison with Pontryagin's Minimum Principle and direct transcription methods and accompanying Google Colab implementation

详情
AI中文摘要

本文提出了一个透明且可复现的基准研究,针对质量-弹簧-阻尼器系统的最优控制,采用直接双网络物理信息神经网络(PINN)公式。经典的线性二次最优控制问题通过两种独立的经典方法求解——Pontryagin最小值原理结合单次打靶法,以及通过梯形配点法的直接转录——并重新表述为一个受约束的优化问题,由两个前馈神经网络求解:一个状态网络,其边界条件通过复合三次和掩码假设精确强制执行;以及一个无约束的控制网络。复合损失结合了配点处的物理残差和成本泛函的梯形近似,并由单个标量超参数加权。在所考虑的基准上,PINN将经典最优成本复现至四位有效数字,精确满足终端状态约束,并产生点态状态和控制误差,这些误差落在两个经典参考的范围内。在此基准上,训练速度比经典打靶法慢大约两个数量级,这是如实报告的。贡献在于方法的清晰性而非方法的新颖性:该公式及附带的Google Colab实现旨在降低实践者探索基于PINN的最优控制的入门门槛,无需预先了解伴随方法或两点边值问题。

英文摘要

This work presents a transparent and reproducible benchmark study of a direct dual-network Physics-Informed Neural Network (PINN) formulation for the optimal control of a mass-spring-damper system. The classical linear-quadratic optimal control problem is solved by two independent classical methods -- Pontryagin's Minimum Principle with single shooting, and direct transcription through trapezoidal collocation -- and recast as a constrained optimization problem solved by two feedforward neural networks: a state network whose boundary conditions are enforced exactly through a composite cubic-and-mask ansatz, and an unconstrained control network. The composite loss combines the physics residual at the collocation points with a trapezoidal approximation of the cost functional, weighted by a single scalar hyperparameter. On the benchmark considered, the PINN reproduces the classical optimal cost to four significant digits, satisfies the terminal state constraints exactly by construction, and produces pointwise state and control errors that fall within the spread of the two classical references. Training is approximately two orders of magnitude slower than classical shooting on this benchmark, which is honestly reported. The contribution is methodological clarity rather than methodological novelty: the formulation and the accompanying Google Colab implementation are intended to lower the barrier to entry for practitioners exploring PINN-based optimal control without prior exposure to adjoint methods or two-point boundary value problems.

2606.15267 2026-06-16 eess.AS cs.SD 新提交

Dynamic Prosody Prediction in LLM-based TTS for Improving Speaker Similarity

基于LLM的TTS中的动态韵律预测以提高说话人相似度

Zhenwei Mou, Liping Chen, Yajun Hu, Zhen-Hua Ling, Xin Fang, Jianqing Gao

发表机构 * University of Science and Technology of China(中国科学技术大学) iFLYTEK

AI总结 针对LLM-based TTS忽略风格特定韵律模式导致说话人相似度不足的问题,提出基于先前预测语音的动态音节韵律预测方法,显著提升韵律学习能力和说话人相似度。

Comments Accepted to INTERSPEECH 2026. 5 pages, 2 figures. Audio samples: https://muzw.github.io/dynapros/

详情
AI中文摘要

个性化文本到语音(TTS)旨在合成语音中克隆目标说话人,模仿其声音和说话风格。当前基于大型语言模型(LLM)的TTS方法忽略了生成语音中风格特定的韵律模式,导致风格学习不足,从而限制了合成语音中的说话人相似度。为此,我们研究了基于合成语音的韵律学习,并提出基于先前预测语音来预测当前音节的韵律。在三个数据集上获得的实验结果表明,所提出的动态韵律预测方法在增强韵律学习能力方面有效,从而提高了生成语音的说话人相似度。音频样本可在 https://muzw.github.io/dynapros/ 获取。

英文摘要

Personalized text-to-speech (TTS) aims to clone the target speaker in the synthesized speech, imitating both the voice and speaking style. Current large language model (LLM)-based TTS methods ignore the style-specific prosodic patterns in generated speech, resulting in deficient style learning and thus limiting speaker similarity in synthesized speech. To this end, we investigate the prosody learning conditioned on the synthesized speech, and propose to predict the prosody of the current syllable based on previously predicted speech. Experimental results obtained on three datasets demonstrated the efficacy of the proposed dynamic prosody prediction method in enhancing the prosody learning capability, thereby improving the speaker similarity of the generated speech. Audio samples are available at https://muzw.github.io/dynapros/.

2606.15264 2026-06-16 eess.AS cs.SD 新提交

DuraMark: Duration-Embedded Watermarking in LLM-based TTS

DuraMark: 基于LLM的文本转语音中的时长嵌入水印

Zhenwei Mou, Weili Jiang, Liping Chen, Zhen-Hua Ling, Kong Aik Lee, Kai Gao, Boyu Zhao

发表机构 * University of Science and Technology of China(中国科学技术大学) Institute of Forensic Science, Ministry of Public Security(公安部刑侦科学研究所) The Hong Kong Polytechnic University(香港理工大学)

AI总结 提出DuraMark,一种基于音节时长编辑的信息级水印框架,利用可控时长的LLM-TTS模型嵌入水印,并使用时长提取器检测,有效抵抗生成式攻击。

Comments Accepted to INTERSPEECH 2026. 5 pages, 1 figure. Audio samples: https://muzw.github.io/duramark_demo/

详情
AI中文摘要

基于大语言模型(LLM)的文本转语音(TTS)模型已实现显著的语音克隆能力,引发了对深度伪造滥用的担忧。语音水印通过将可追溯信息嵌入生成的语音中来缓解这一问题。主流水印方法在信号级(波形或频谱图)操作,使得水印易受生成式攻击(如神经编解码器和声码器)的影响。为解决此问题,我们提出DuraMark,一种鲁棒的信息级水印框架。它利用音节时长编辑实现水印嵌入。具体而言,DuraMark集成了一个时长可控的基于LLM的TTS模型,在合成过程中编辑音节时长,并配以时长提取器提取这些时长用于检测。实验表明,DuraMark对生成式攻击具有优越的鲁棒性,显著优于信号级基线。音频样本可在https://muzw.github.io/duramark_demo/获取。

英文摘要

Large language model (LLM)-based text-to-speech (TTS) models have achieved remarkable voice cloning capabilities, raising concerns about potential deepfake misuse. Speech watermarking mitigates this by embedding traceable information into generated speech. Mainstream watermarking methods operate at the signal level (waveform or spectrogram), rendering the watermark vulnerable to generative attacks (e.g., neural codec and vocoder). To address this, we propose DuraMark, a robust information-level watermarking framework. It utilizes syllable duration editing to achieve watermark embedding. Specifically, DuraMark integrates a duration-controllable LLM-based TTS model to edit syllable durations during synthesis, coupled with a duration extractor to extract these durations for detection. Experiments demonstrate DuraMark's superior robustness against generative attacks, significantly outperforming signal-level baselines. Audio samples are available at https://muzw.github.io/duramark_demo/.

2606.15246 2026-06-16 cs.LO cs.AI cs.DL 新提交

Provenance-Enhanced Statements in Knowledge Graphs

知识图谱中增强来源的陈述

Fabio Vitali, Valentina Pasqual

发表机构 * University of Bologna(博洛尼亚大学)

AI总结 提出DEC框架,通过认知模态逻辑将来源谓词解释为认知立场指示器,并分组为认知世界,实现基于来源的推理,避免将分歧视为不一致。

Comments 33 pages

详情
AI中文摘要

在当代知识图谱中,形式为“根据$X$,$φ$”的增强来源陈述无处不在,尤其是在图内容主要表示主张、解释和假设(\emph{capta})而非观察者独立事实(\emph{data})的领域。当前的来源模型可以记录谁说了什么,但通常将来源视为语义中性的,未充分说明归因陈述与事实承诺、彼此之间以及推理的关系。在本文中,我们引入DEC框架,该框架将来源谓词解释为认知立场指示器,并将来源同质的陈述集分组为\emph{认知世界}。借鉴认知模态逻辑(信念、知识和推测),DEC刻画了认知世界与一个特殊的事实核心(“现实”)之间的局部性、合理性和可控渗透,从而能够对归因内容进行有原则的推理,而不会将分歧视为不一致。我们为RDF数据集形式化了DEC解释,该解释对RDF 1.2语义是保守的,阐明了内涵性和同一性(包括超人悖论)的作用,并在常见的语义网表示(命名图、引用三元组/RDF-star和具体化)上说明了该方法。最后,我们描述了原型DEC推理器,它作为Fuseki数据集模块实现,支持受控事实化以及分歧和错觉的显式检测。

英文摘要

Provenance-enhanced statements of the form "according to $X$, $φ$" are pervasive in contemporary knowledge graphs, especially in domains where graph content primarily represents claims, interpretations, and hypotheses (\emph{capta}) rather than observer-independent facts (\emph{data}). Current provenance models can record who asserted what, but they typically treat provenance as semantically neutral, leaving underspecified how attributed claims relate to factual commitment, to one another, and to reasoning. In this paper we introduce DEC, a framework that interprets provenance predicates as indicators of epistemic stance and groups provenance-homogeneous sets of statements into \emph{cognitive worlds}. Drawing on cognitive modal logics (doxastic, epistemic, and conjectural), DEC characterizes locality, rationality, and controlled permeation between cognitive worlds and a distinguished factual core ("reality"), thereby enabling principled reasoning over attributed content without collapsing disagreements into inconsistencies. We formalize a DEC interpretation for RDF datasets that is conservative over RDF~1.2 semantics, clarify the role of intensionality and identity (including the Superman paradox), and illustrate the approach on common Semantic Web representations (named graphs, quoted triples/RDF-star, and reification). Finally, we describe our prototype DEC reasoner implemented as a Fuseki dataset module, supporting controlled factualisation and explicit detection of disagreements and delusions.

2606.15242 2026-06-16 cs.CR cs.AI 新提交

Benign in Isolation, Harmful in Composition: Security Risks in Agent Skill Ecosystems

孤立无害,组合有害:智能体技能生态系统中的安全风险

Yi Xie, Jiawei Du, Yu Cheng, Jiuan Zhou, Zhaoxia Yin

发表机构 * East China Normal University(东华大学) Centre for Frontier AI Research A*STAR(前沿人工智能研究中心A*STAR) Shanghai Innovation Institute(上海创新研究院)

AI总结 提出技能组合风险(SCR)概念,通过SCR-Bench基准测试发现,多个技能在共享上下文中组合执行时,攻击成功率显著高于孤立评估,强调应基于激活路径评估技能安全性。

详情
AI中文摘要

技能正成为LLM智能体将计划转化为行动的能力层,但其使用引入了数据泄露、未授权操作和工具滥用等安全风险。现有的审查通常孤立评估每个技能,而实际智能体任务常在共享执行上下文中调用多个技能。这产生了技能组合风险(SCR):一个单独看似良性的技能,当其输出、信任信号、授权线索或副作用影响激活路径上的后续调用时,可能变得有害。我们引入SCR-Bench来在受控的沙盒技能环境中评估此风险。SCR-Bench不仅依赖文本意图或表面行为,还记录组合技能执行中的下游状态变化和路径级结果。它包含三个子基准:SCR-CapFlow(能力流组合)、SCR-TrustLift(信任转移组合)和SCR-AuthBlur(授权混淆组合)。在SCR-Bench中,组合路径暴露的风险在孤立评估下基本不存在。在SCR-CapFlow中,组合下的攻击成功率达到33.6%,而孤立基线接近零。在SCR-TrustLift中,五个后端中有四个的攻击成功率超过96.5%。在SCR-AuthBlur中,相对于L1上下文设置下的L0孤立基线,风险批准率增加了71.8%。这些结果表明,智能体技能安全性应在激活路径层面而非孤立工件层面进行评估。SCR和SCR-Bench为LLM智能体技能生态系统中的路径感知风险评估和防御提供了基础。基准测试:https://github.com/saint-viperx/SCR_Bench。

英文摘要

Skills are becoming the capability layer through which LLM agents turn plans into actions, but their use introduces security risks such as data leakage, unauthorized operations, and tool misuse. Existing vetting usually evaluates each skill in isolation, while real agent tasks often invoke multiple skills in a shared execution context. This creates Skill Composition Risk (SCR): a skill that appears benign alone can become harmful when its outputs, trust signals, authorization cues, or side effects influence later invocations along an activated path. We introduce SCR-Bench to evaluate this risk in controlled, sandboxed skill environments. Rather than relying only on textual intent or surface behavior, SCR-Bench records downstream state changes and path-level outcomes across composed skill executions. It contains three sub-benchmarks: SCR-CapFlow for capability-flow composition, SCR-TrustLift for trust-transfer composition, and SCR-AuthBlur for authorization-confusion composition. Across SCR-Bench, composed paths expose risks that are largely absent under isolated evaluation. In SCR-CapFlow, attack success rate reaches 33.6 percent under composition, compared with near-zero isolated baselines. In SCR-TrustLift, attack success rate exceeds 96.5 percent on four of five backends. In SCR-AuthBlur, the risky-approval rate increases by 71.8 percent relative to the L0 isolated baseline under the L1 context setting. These results show that agent skill security should be assessed at the level of activated paths rather than isolated artifacts. SCR and SCR-Bench provide a foundation for path-aware risk evaluation and defense in LLM agent skill ecosystems. Benchmark: https://github.com/saint-viperx/SCR_Bench.

2606.15238 2026-06-16 cs.GR cs.CV 新提交

HairLRM: Strand-based Hair Modeling via Large Reconstruction Models

HairLRM:基于大型重建模型的发丝建模

Yuefan Shen, Yican Dong, Xiufeng Huang, Zhongtian Zheng, Youyi Zheng, Kui Wu

发表机构 * LIGHTSPEED Shenzhen China(LIGHTSPEED深圳中国) State Key Lab of CAD and CG, Zhejiang University Hangzhou China(计算机辅助设计与图形学国家重点实验室,浙江大学杭州中国) Hong Kong Baptist University Hong Kong China(香港 Baptist大学香港中国) LIGHTSPEED Los Angeles CA USA (2026)(LIGHTSPEED洛杉矶CA美国(2026))

AI总结 针对传统发丝建模从2D图像推断3D结构的不适定性问题,提出结合大型重建模型的几何先验,利用双方向自编码器将粗几何提升为高保真发丝,通过潜在空间优化和表面引导细化解决矢量场奇点,实现鲁棒且精确的发丝重建。

Comments ACM SIGGRAPH 2026 Conference Paper

详情
AI中文摘要

传统基于发丝建模的根本限制不仅仅是数据稀缺,而是在没有结构约束的情况下从2D图像推断复杂3D场的不适定性。这种无约束回归会导致在解决全局遮挡(例如马尾辫)和局部方向性(例如卷发)时出现灾难性失败,产生过度平滑、看似合理但不正确的几何形状。为了解决这个问题,我们将大型重建模型(LRM)的强几何先验集成到发丝生成流程中。使用LRM网格作为结构锚点,我们采用一种新颖的双方向自编码器将粗几何提升为高保真发丝。通过潜在空间优化和表面引导细化解决矢量场奇点,我们的方法有效解缠复杂的拓扑结构,为头发重建的鲁棒性和准确性设立了新的基准。

英文摘要

The fundamental limitation of traditional strand-based modeling is not simply data scarcity, but the ill-posedness of inferring complex 3D fields from 2D imagery without structural constraints. This unconstrained regression leads to catastrophic failures in resolving both global occlusion (e.g., in ponytails) and local directionality (e.g., in curls), resulting in over-smoothed, plausible-but-incorrect geometries. To resolve this, we integrate the strong geometric priors of Large Reconstruction Models (LRMs) into the strand generation pipeline. Using the LRM mesh as a structural anchor, we employ a novel Dual Orientation AutoEncoder to lift coarse geometry into high-fidelity strands. By resolving vector field singularities through latent-space optimization and surface-guided refinement, our method effectively disentangles complex topological structures, setting a new benchmark for robustness and accuracy in hair reconstruction.

2606.15234 2026-06-16 eess.SP cs.CE cs.LG 新提交

Surrogate-Assisted Framework for SI-Compliant Interconnect Design Optimization Using the Earth Mover's Distance

基于推土机距离的SI合规互连设计优化代理辅助框架

Emre Ecik, Werner John, Julian Withöft, Ralf Brüning, Jürgen Götze

发表机构 * Information Processing Lab, TU Dortmund University(图腾大学信息处理实验室) Pyramide2525/TU Dortmund University(图腾大学Pyramide2525分部) EMC Technology Center Paderborn, Zuken GmbH(帕德博恩EMC技术中心,祖克纳公司)

AI总结 提出一种基于推土机距离的确定性机器学习辅助框架,通过代理模型预测波形、决策树筛选SI合规设计,并利用EMD排序,实现可解释且高效的PCB互连优化。

Comments 16 pages, 15 figures. This manuscript has been submitted to Advances in Radio Science for review (2026)

详情
AI中文摘要

本文提出一种基于推土机距离(EMD)的确定性机器学习辅助框架,用于SI合规的PCB设计。与依赖迭代黑盒搜索过程的传统代理优化方法不同,本方法采用可解释的顺序评估策略。首先使用神经代理模型根据拓扑相关设计参数高效预测波形描述特征。然后,决策树作为物理驱动的质量门,根据预定义的SI标准识别SI合规波形。在得到的有效解空间中,采用推土机距离作为相似性度量,根据候选设计与理想参考信号的接近程度对其进行排序。这不仅能够确定性地识别可接受的参数区域,而且无需逆建模或随机搜索过程即可透明地优先选择物理上更优的解。通过大规模仿真DDR3飞越波形数据集验证了该方法。通过结合代理预测、可解释分类和基于EMD的波形评估,该框架为基于AI方法的PCB开发提供了可解释且计算高效的替代传统优化策略的方案。

英文摘要

This work presents a deterministic, machine-assisted framework for SI-compliant PCB design based on the Earth Mover's Distance (EMD). In contrast to conventional surrogate-based optimization methods that rely on iterative black-box search procedures, the proposed approach follows an interpretable, sequential evaluation strategy. Neural surrogate models are first used to efficiently predict waveform describing features from topology-dependent design parameters. A decision tree then acts as a physically motivated quality gate that identifies SI-compliant waveforms according to predefined SI criteria. Within the resulting valid solution space, the Earth Mover's Distance is employed as a similarity metric to rank candidate designs according to their proximity to an ideal reference signal. This enables not only the deterministic identification of admissible parameter regions but also a transparent prioritization of physically superior solutions without inverse modeling or stochastic search procedures. The methodology is demonstrated using a large-scale set of simulated DDR3 fly-by waveforms. By combining surrogate prediction, interpretable classification, and EMD-based waveform evaluation, the framework provides an explainable and computationally efficient alternative to conventional optimization strategies for supporting PCB development with AI-based methods.

2606.15217 2026-06-16 stat.ML cs.LG 新提交

Conformal Candidate Certification for Offline Model-Based Optimization

离线模型优化的共形候选认证

Seungjin Choi

发表机构 * Seungjin Choi(Choi)

AI总结 提出共形候选认证(CCC)方法,通过加权共形预测为离线模型优化中的候选设计提供校准的单侧下界,确保超过目标阈值的候选被认证,解决了分布偏移下的统计可靠性问题。

Comments ICML 2026 Workshop on Decision-Making from Offline Datasets to Online Adaptation: Black-Box Optimization to Reinforcement Learning

详情
AI中文摘要

离线模型优化(MBO)通过优化在固定历史数据集上训练的代理模型来提出候选方案。由于候选方案故意处于分布外,代理模型的排名在最优化器最激进的地方最不可靠,然而现有方法没有为每个候选提供统计证书,证明其设计满足目标阈值。我们提出\emph{共形候选认证}(CCC),一种事后包装器,为每个候选附加一个校准的单侧下界,并仅推进那些下界超过目标阈值的候选。我们证明,熵正则化的代理最大化诱导出吉布斯倾斜提议,因此同一代理模型为加权共形预测提供重要性权重,无需单独的密度比估计步骤。在受控的合成研究中,CCC在名义水平0.90下认证了激进提议池中的16.7%的候选,经验覆盖率为0.990,而忽略协变量偏移的标准共形预测覆盖率降至0.416。

英文摘要

Offline model-based optimization (MBO) proposes candidates by optimizing a surrogate trained on a fixed historical dataset. Because candidates are deliberately out-of-distribution, surrogate rankings are least reliable exactly where the optimizer is most aggressive, yet existing methods provide no per-candidate statistical certificate that a design meets a target threshold. We propose \emph{Conformal Candidate Certification} (CCC), a post-hoc wrapper that attaches a calibrated one-sided lower bound to each candidate and advances only those whose bound exceeds the target. We show that entropy-regularized surrogate maximization induces a Gibbs-tilted proposal, so the same surrogate supplies importance weights for weighted conformal prediction without a separate density-ratio estimation step. In a controlled synthetic study, CCC certifies $16.7\%$ of an aggressive proposal pool with empirical coverage 0.990 at nominal 0.90, while standard conformal prediction ignoring the covariate shift collapses to 0.416 coverage.

2606.15206 2026-06-16 econ.TH cs.AI 新提交

AI Contagion in Social Networks

社交网络中的人工智能传染

Olivier Bos, Stefano Bosi

发表机构 * Université Paris-Saclay, ENS Paris-Saclay, Centre for Economics at Paris-Saclay(巴黎萨克雷大学、巴黎萨克雷高等师范学院、巴黎萨克雷经济中心) Université Paris-Saclay, Université d’Evry Paris-Saclay, Centre for Economics at Paris-Saclay, EPEE(巴黎萨克雷大学、埃弗里巴黎萨克雷大学、巴黎萨克雷经济中心、EPEE)

AI总结 研究AI与社交网络互动如何影响集体知识稳定性,通过AI传染渠道和AI社会扭曲乘子两个反馈力,发现系统长期行为可二维表示,谱半径决定稳定性,并刻画了稳定所需的最小过滤阈值及网络拓扑对信息风险的影响。

Comments 49 pages, 2 figures (coded in LaTeX)

详情
AI中文摘要

我们研究人工智能(AI)如何与社会通信网络互动,以塑造集体知识的稳定性。智能体通过网络交换信息,同时接收AI生成的内容,而AI系统在其影响的总和社会信息上重新训练。这种互动产生了两种反馈力:一个AI传染渠道,通过该渠道扭曲在网络中扩散;以及一个AI社会扭曲乘子,通过该渠道重新训练放大过去的错误。尽管环境具有高维性,我们表明系统的长期行为允许一个二维表示,其谱半径决定了AI中介的信息系统是动态稳定还是不稳定的。我们刻画了一个尖锐的监管前沿,识别了稳定性所需的最小过滤,并展示了网络拓扑如何塑造系统性信息风险。

英文摘要

We study how artificial intelligence (AI) interacts with social communication networks to shape the stability of collective knowledge. Agents exchange information through a network while receiving AI-generated content, and AI systems retrain on the aggregate social information they influence. This interaction generates two feedback forces: an AI contagion channel, through which distortions diffuse across the network, and an AI social distortion multiplier, through which retraining amplifies past errors. Despite the high dimensionality of the environment, we show that the long-run behavior of the system admits a two-dimensional representation whose spectral radius determines whether AI-mediated information systems are dynamically stable or unstable. We characterize a sharp regulatory frontier identifying the minimum filtering required for stability and show how network topology shapes systemic informational risk.

2606.15187 2026-06-16 eess.AS cs.SD 新提交

VoxWatermark: A Large-Scale Benchmark for Audio Watermark Detection under Perturbations

VoxWatermark: 一个用于扰动下音频水印检测的大规模基准

Farnaz Sedaghati, Yuxi Wang, Zicheng Weng, Wei Rao

发表机构 * University of Tehran, Iran(伊朗德黑兰大学) Nanyang Technological University, Singapore(新加坡南洋理工大学)

AI总结 为解决缺乏统一基准的问题,构建VoxWatermark,包含10种水印方法和三种扰动类型,并提出鲁棒检测器AudioWMD,验证了其有效性和可扩展性。

Comments Accepted by Interspeech 2026

详情
AI中文摘要

随着语音生成系统在开放环境中的快速部署,为音频内容提供可验证的来源归属和版权问责变得至关重要。当前研究的一个空白是缺乏一个统一的基准,能够在现实分布偏移下系统地比较不同的水印注入方法。为此,我们构建了VoxWatermark,通过在多语言、多源语料库上应用10种水印方法(4种神经方法和6种传统方法),并采用统一的注入和标注,同时引入无盒、黑盒和白盒扰动来模拟真实的录制和传输条件。基于该基准,我们提出了AudioWMD,作为大规模、多方法、跨分布设置下的鲁棒基线检测器。结果表明,注入方法的多样性和分布偏移会影响检测稳定性,同时验证了AudioWMD的有效性和可扩展性。数据集和代码已公开。

英文摘要

With the rapid deployment of speech generation systems in open environments, providing verifiable source attribution and copyright accountability for audio content has become critical. A gap in current research is the lack of a unified benchmark that systematically compares different watermark injection methods under realistic distribution shifts. To address this, we build VoxWatermark by applying 10 watermarking methods (4 neural and 6 traditional) with unified injection and annotation on multilingual, multi-source corpora, and introducing no-box, black-box, and white-box perturbations to simulate real recording and transmission conditions. Based on this benchmark, we propose AudioWMD as a robust baseline detector for large-scale, multi-method, cross-distribution settings. Results show that injection-method diversity and distribution shifts affect detection stability, while validating the effectiveness and scalability of AudioWMD. Dataset and code are publicly available.