arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.01891 2026-06-02 cs.GR cs.LG

MidSurfNet: Learnable Face Pairing and Interference Implicit Fields for Generalized Mid-surface Abstraction

MidSurfNet：面向广义中面抽象的可学习面配对与干涉隐式场

Li Ye, Xinhang Zhou, Xingyu Yang, Ruofeng Tong, Hailong Li, Peng Du, Min Tang

发表机构 * College of Computer Science and Technology, Zhejiang University（浙江大学计算机科学与技术学院）； Shenzhen Poisson Software Co., Ltd.（深圳波森软件有限公司）

AI总结提出MidSurfNet框架，通过可学习的面配对模块和干涉隐式场，解决薄壁CAD模型中多壁厚、自匹配及非中心偏移等复杂场景的中面抽象问题，实现87.32%的面配对准确率。

Comments 20 pages, 12 figures, 5 tables

详情

AI中文摘要

中面抽象对于薄壁CAD模型的有限元分析至关重要。现有的基于面配对的方法依赖手工几何启发式，但实际工业模型常呈现多壁厚区域、自匹配面配置，并需要非中心偏移曲面——在这些场景中，基于规则的方法始终失败。我们提出MidSurfNet，一个学习增强框架，通过两个新颖组件解决这些局限：(1) 神经面配对模块，从几何和拓扑特征学习预测面配对置信度，处理超越基于规则方法的复杂配对场景；(2) 干涉隐式场，将中面表示为两个符号距离函数的干涉，实现广义偏移控制，以便在下游CAE/FEA导向工作流中灵活定位。我们构建了一个包含超过1500个手动标注CAD模型的大规模中面数据集。实验表明，MidSurfNet达到87.32%的面配对准确率，并成功处理了困扰所有现有方法的多壁厚（完成率61.90%）和自匹配（完成率52.94%）场景。此外，MidSurfNet为面向CAE的应用提供了具有任意偏移控制的广义中面抽象的学习方法。

英文摘要

Mid-surface abstraction is essential for finite element analysis of thin-walled CAD models. Existing face pairing-based methods rely on handcrafted geometric heuristics, yet real-world industrial models frequently exhibit multi-wall-thickness regions, self-matching face configurations, and demand for non-center offset surfaces--scenarios where rule-based approaches consistently fail. We present MidSurfNet, a learning-augmented framework that addresses these limitations through two novel components: (1) a neural face pairing module that learns to predict face pair confidence from geometric and topological features, handling complex pairing scenarios beyond rule-based methods; and (2) an interference implicit field that represents mid-surfaces as the interference of two signed distance functions, enabling generalized offset control for flexible positioning in downstream CAE/FEA-oriented workflows. We construct a large-scale mid-surface dataset containing over 1,500 manually annotated CAD models. Experiments demonstrate that MidSurfNet achieves 87.32% face pairing accuracy and successfully handles multi-wall-thickness (61.90% completion) and self-matching (52.94% completion) scenarios that confound all existing methods. Furthermore, MidSurfNet provides a learning-based approach to generalized mid-surface abstraction with arbitrary offset control for CAE-oriented applications.

URL PDF HTML ☆

赞 0 踩 0

2606.01862 2026-06-02 cs.MA cs.AI cs.NI

RadioMaster: Multi-Agent System for Autonomous Radio Signal Generation

RadioMaster: 自主无线电信号生成的多智能体系统

Jiazhen Lei, Tianze Cao, Yuxin Sha, Sihan Wang, Bingbing Wang, Fengyuan Zhu, Zeming Yang, Xiaohua Tian

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出RadioMaster，一个全自主的多智能体框架，通过RadioWiki、RadioAgent和RadioEmulator三大支柱，将用户意图转化为真实无线信号，解决现有模型因领域知识和硬件约束敏感性不足而无法生成无线电信号的问题。

详情

AI中文摘要

将用户意图转化为物理无线电信号是无线原型设计中关键但繁琐的最后一步，因为它需要复杂的物理层细节知识，并带来巨大的实现挑战。大型语言模型（LLM）和多智能体系统已经彻底改变了传统的软件工程，提出了一个引人深思的问题：它们能否解决这些艰巨的困难？然而，我们的研究表明，当前模型在应用于无线电信号生成时存在显著局限性，无法完成此任务。这种性能下降主要源于严重的领域无知和对物理硬件约束的根本不敏感。为弥补这一差距，我们引入了RadioMaster，一个完全自主的多智能体框架，旨在将用户输入无缝转化为真实的无线发射。RadioMaster基于三个协同支柱运行：用于领域特定知识检索的RadioWiki、用于协作I/Q样本生成和硬件配置的RadioAgent，以及用于闭环物理层验证的RadioEmulator。此外，我们构建了RadioBench，这是首个专门针对无线电信号生成领域的全面基准测试。广泛的真实世界评估表明，RadioMaster在配置可行性和信号保真度方面显著优于最先进的基线方法。

英文摘要

Translating user intents into physical radio signals represents the critical yet notoriously tedious final step in wireless prototyping, as it requires intricate knowledge of physical layer details and presents immense implementation challenges. Large Language Models (LLMs) and multi-agent systems have revolutionized conventional software engineering, raising the compelling question of whether they can resolve these formidable difficulties. However, our investigations reveal that current models experience significant limitations and fail to accomplish this task when applied to radio signal generation. This performance degradation primarily stems from severe domain ignorance and a fundamental insensitivity to physical hardware constraints. To bridge this gap, we introduce RadioMaster, a fully autonomous multi-agent framework designed to seamlessly translate user input into real-world wireless emissions. RadioMaster operates on three synergistic pillars: RadioWiki for domain-specific knowledge retrieval, RadioAgent for collaborative I/Q sample generation alongside hardware configuration, and RadioEmulator for closed-loop physical layer verification. Furthermore, we construct RadioBench, the first comprehensive benchmark tailored specifically for the radio signal generation domain. Extensive real-world evaluations demonstrate that RadioMaster significantly outperforms state-of-the-art (SOTA) baselines regarding configuration viability and signal fidelity.

URL PDF HTML ☆

赞 0 踩 0

2606.01856 2026-06-02 cs.DC cs.AI

Boosting Multimodal Federated Learning via Chained Modality Optimization

通过链式模态优化提升多模态联邦学习

Zixin Zhang, Fan Qi, Shuai Li, Xiaoshan Yang, Changsheng Xu

发表机构 * School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China（天津理工大学计算机科学与工程学院）； Institute of Automation, Chinese Academy of Sciences, Beijing, China（中国科学院自动化研究所）； College of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia, China（内蒙古大学计算机学院）

AI总结针对多模态联邦学习中模态竞争导致全局模型次优的问题，提出FedMChain框架，通过分阶段优化、误差补偿正则化和稀疏符号引导聚合，提升预测性能并降低通信开销。

详情

AI中文摘要

多模态联邦学习（MMFL）能够在具有异构数据和模态可用性的分散客户端之间实现隐私保护的协作学习。然而，现有大多数MMFL方法将多模态训练视为联合优化问题，忽略了一个关键瓶颈：模态竞争，即主导模态抑制较弱模态，导致全局模型次优。为解决这一问题，我们提出FedMChain，一个平衡的MMFL框架，将联邦多模态训练结构化为一系列模态阶段。这种分阶段设计为每个模态在多模态客户端上提供了专用的局部优化窗口，以缓解模态竞争，并通过误差补偿正则化器进一步促进跨模态互补性。在服务器端，我们采用稀疏符号引导聚合策略，利用方向符号一致性进行稳健的模态内聚合，避免破坏性平均，并支持较少的同步频率以降低通信开销。在多模态基准上的大量实验表明，FedMChain在需要比基线更少通信频率的同时，持续提高了预测性能。

英文摘要

Multimodal Federated Learning (MMFL) enables privacy-preserving collaborative learning across decentralized clients with heterogeneous data and modality availability. However, most existing MMFL methods cast multimodal training as a joint optimization problem, overlooking a key bottleneck: modality competition, where dominant modalities suppress weaker ones and lead to suboptimal global models. To address this, we propose FedMChain, a balanced MMFL framework that structures federated multimodal training as a chain of modality-wise phases. This phase-wise design gives each modality a dedicated local optimization window on multimodal clients to mitigate modality competition, and further promotes cross-modal complementarity via an error-compensated regularizer. On the server side, we employ a sparse sign-guided aggregation strategy that leverages directional sign agreement for robust intra-modality aggregation, avoids destructive averaging, and supports less frequent synchronization to reduce communication overhead. Extensive experiments on multimodal benchmarks demonstrate that FedMChain consistently improves predictive performance while requiring less frequent communication than baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.01839 2026-06-02 cs.DC cs.AR cs.LG

Observation, Not Prediction: Conversation-Level Disaggregated Scheduling for Agentic Serving

观察而非预测：面向智能体服务的对话级解耦调度

Jianru Ding, Ryien Hosseini, Pouya Mahdi Gholami, Mingyuan Xiang, Henry Hoffmann

发表机构 * Anonymous Authors（匿名作者）

AI总结提出将调度单元从单轮提升至整个对话，利用对话中首轮计算密集与后续内存密集的两阶段可观察特性，实现无需预测的解耦调度，显著降低延迟并提升能效。

详情

AI中文摘要

基于LLM的智能体通过多轮依赖推理和工具调用来解决用户任务，产生的工作负载在任务到达时总成本未知。现有的多轮系统以轮次为调度单元，逐轮决定是否将预填充与解码解耦。该决策依赖于该轮的解码长度、工具行为和KV增长，这些量在调度器必须行动时不可观察，迫使系统进行预测。我们表明这种对预测的依赖是由调度单元而非工作负载强加的。将调度单元从轮次提升到对话，将轮次级的不规则性转化为稳定的两阶段结构：1) 计算密集的首轮预填充，随后是2) 长尾内存密集阶段。因此，以对话为调度单元，放置问题简化为读取首轮输入长度和每解码器KV占用率，两者均可直接观察。我们在ConServe中实例化这一原则，它将首轮预填充路由到高吞吐预填充器，精确传输KV缓存一次，并将对话固定到单个解码器处理其整个尾部，无需学习解码侧成本模型。与每轮预测基线相比，ConServe将p95首次有效令牌时间（对话首个用户可见输出的延迟）降低51.08%，能效提升7.51%，同时保持最后一轮的TBT和SLO；将两阶段映射到异构GPU层级可进一步增加22.75%的能效。

英文摘要

LLM-based agents resolve a user task through many turns of dependent inference and tool calls, producing a workload whose total cost is unknown when the task arrives. Existing multi-turn systems keep the turn as the scheduling unit and decide, turn by turn, whether to disaggregate prefill from decode. That decision rests on the turn's decode length, tool behavior, and KV growth, quantities that are not observable when the scheduler must act, forcing the system to predict them. We show this dependence on prediction is imposed by the scheduling unit, not the workload. Raising the scheduling unit from the turn to the conversation converts turn-level irregularity into a stable, two-phase structure: 1) a compute-bound turn-1 prefill followed by 2) a long, memory-bound tail. Thus, with the conversation as the scheduling unit, placement reduces to reading the first-turn input length and per-decoder KV occupancy, both directly observable. We instantiate this principle in ConServe, which routes the first-turn prefill to a high-throughput prefiller, transfers the KV cache exactly once, and pins the conversation to a single decoder for its entire tail, with no learned model of decode-side cost. Against a per-turn prediction baseline, ConServe reduces p95 time-to-first-effective-token (the latency of a conversation's first user-visible output) by 51.08% and improves energy efficiency by 7.51% while preserving last-turn TBT and SLOs; mapping the two phases onto heterogeneous GPU tiers adds a further 22.75% in energy efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.01828 2026-06-02 cs.MA cs.AI

Dynamic Trust-Aware Sparse Communication Topology for LLM-Based Multi-Agent Consensus

基于动态信任感知的稀疏通信拓扑用于基于LLM的多智能体共识

Wanshuang Gou, Zihan Liu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出DySCo动态稀疏共识机制，通过信任感知的边选择降低通信开销并保持共识质量。

Comments 11 pages, 3 figures, 5 tables

详情

AI中文摘要

大型语言模型驱动的多智能体系统通过多轮讨论、角色专业化和交叉验证增强了复杂推理任务的可靠性。然而，现有的多智能体辩论和协作框架通常采用全连接通信，导致消息数量、令牌成本和端到端延迟随智能体数量近似二次增长；尽管固定稀疏拓扑减少了开销，但它们无法适应不同任务实例或中间推理状态，容易保留低价值交互或丢失关键的纠错信息。针对这一问题，本文提出了DySCo（动态稀疏共识），一种动态信任感知的稀疏共识机制。在每一轮推理中，DySCo基于智能体可靠性、答案分歧和任务相关性估计通信边的价值，并在预算约束下选择少量高价值边进行消息交换；然后通过动态信任权重聚合不同智能体的答案，并在共识稳定后提前终止讨论。该机制用按需通信替代通用广播，从而在保留关键交叉验证信息的同时降低通信开销。我们进一步给出了通信复杂度和共识稳定性的分析，并在数学推理、逻辑推理和事实问答任务上评估了DySCo的性能。

英文摘要

Large language model-driven multi-agent systems enhance the reliability of complex reasoning tasks through multi-round deliberation, role specialization, and cross-validation. However, existing multi-agent debate and collaboration frameworks typically adopt fully connected communication, causing the number of messages, token costs, and end-to-end latency to grow approximately quadratically with the number of agents; although fixed sparse topologies reduce overhead, they cannot adapt communication relationships to different task instances or intermediate reasoning states, making them prone either to preserving low-value interactions or to losing critical error-correction information. To address this problem, this paper proposes DySCo (Dynamic Sparse Consensus), a dynamic trust-aware sparse consensus mechanism. In each round of reasoning, DySCo estimates the value of communication edges based on agent reliability, answer divergence, and task relevance, and selects a small number of high-value edges for message exchange under budget constraints; it then aggregates the answers of different agents through dynamic trust weights and terminates the discussion early once consensus stabilizes. This mechanism replaces universal broadcasting with on-demand communication, thereby reducing communication overhead while preserving essential cross-validation information. We further present analyses of communication complexity and consensus stability, and evaluate the performance of DySCo on mathematical reasoning, logical reasoning, and factual question-answering tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.01827 2026-06-02 math.OC cs.LG stat.ML

Adaptive Sharpness-Aware Minimization with a Polyak-type Step size: A Theory-Grounded Scheduler

自适应锐度感知最小化与Polyak型步长：一种理论驱动的调度器

Dimitris Oikonomou, Nicolas Loizou

发表机构 * Mathematical Institute for Data Science (MINDS), Johns Hopkins University, Baltimore, MD, USA（数据科学数学研究所（MINDS），约翰霍普金斯大学，巴尔的摩，MD，美国）； Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA（计算机科学系，约翰霍普金斯大学，巴尔的摩，MD，美国）； Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA（应用数学与统计学系，约翰霍普金斯大学，巴尔的摩，MD，美国）

AI总结针对锐度感知最小化（SAM）对学习率敏感的问题，受随机Polyak步长启发，提出适用于SAM的Polyak调度器，在确定性和随机设置下实现自适应算法，并证明收敛性，实验表明性能优于或媲美调优的SAM基线。

Comments 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

锐度感知最小化（SAM）已成为训练机器学习模型的一种强大且广泛采用的优化器。通过显式最小化损失景观的锐度，SAM通常能提高泛化能力，同时提供强大的经验性能。然而，SAM及其变体，像大多数训练算法一样，对学习率的选择敏感，而学习率通常通过广泛的超参数调优或预定义调度器来选择。在这项工作中，受随机Polyak步长对随机梯度下降（SGD）有效性的最新进展的启发，我们推导了针对SAM风格更新的Polyak调度器，在确定性和随机设置下产生了新颖的自适应算法。在光滑设置中，我们证明了强凸目标的线性收敛性和确定性情况下凸目标的$\mathcal{O}(1/T)$收敛率。在随机设置中，我们建立了直到最优解邻域的类似收敛保证。数值实验表明，所提出的Polyak调度器实现了与精心调优的SAM基线相当或更好的性能，同时大大减少了对学习率调优的需求。

英文摘要

Sharpness-Aware Minimization (SAM) has established itself as a powerful and widely adopted optimizer for training machine learning models. By explicitly minimizing the sharpness of the loss landscape, SAM often improves generalization while delivering strong empirical performance. However, SAM and its variants, like most training algorithms, are sensitive to the choice of learning rate, which is typically selected through extensive hyperparameter tuning or predefined schedulers. In this work, motivated by recent advances on the effectiveness of stochastic Polyak step sizes for Stochastic Gradient Descent (SGD), we derive Polyak schedulers tailored to SAM-style updates, yielding novel adaptive algorithms in both deterministic and stochastic settings. In the smooth setting, we prove linear convergence for strongly convex objectives and an $\mathcal{O}(1/T)$ convergence rate for convex objectives in the deterministic case. In the stochastic setting, we establish analogous convergence guarantees up to a neighborhood of the optimum. Numerical experiments demonstrate that the proposed Polyak schedulers achieve performance comparable to or better than carefully tuned SAM baselines, while substantially reducing the need for learning-rate tuning.

URL PDF HTML ☆

赞 0 踩 0

2606.01816 2026-06-02 q-bio.BM cs.LG

Site4Drug: Predicting Drug-Binding Target Sites with an AI Agent

Site4Drug: 利用AI智能体预测药物结合靶点

Taehan Kim, Sarrah Rose Mikhail Leung, Bharat Mekala, Jeongbin Park

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）

AI总结提出Site4Drug，一种模态感知的靶点发现智能体，通过整合拓扑、亲水性、翻译后修饰等证据，输出带约束、风险标记和决策日志的可靶向区域排名列表，并自动推荐结合模态。

Comments Accepted to the ICML 2026 Workshop on Generative and Agentic AI for Biology (GenBio)

详情

AI中文摘要

选择在蛋白质上的干预位置（即选择可靶向位点）通常比选择结合物更模糊且更容易失败，尤其是对于膜蛋白，其可及性、拓扑和翻译后修饰（PTMs）限制了可作用区域。我们提出Site4Drug，一种模态感知的位点发现智能体，输出带有显式约束、证据摘要、风险标记和可追溯决策日志的可靶向区域排名列表。Site4Drug无需用户预先指定药物模态，而是利用与位点发现相同的证据（包括拓扑、亲水性、PTM倾向、二硫键、结构域背景和序列）推荐结合模态（例如抗体/肽类 vs 小分子）。重要的是，这些证据一致地应用于所有模态，包括小分子口袋发现，以避免选择化学上可行但生物学上被遮蔽的位点。

英文摘要

Selecting where to intervene on a protein (i.e., choosing a targetable site) is often a more ambiguous and failure-prone bottleneck than selecting what binds, especially for membrane proteins where accessibility, topology, and post-translational modifications (PTMs) constrain actionable regions. We present Site4Drug, a modality-aware site-finding agent that outputs a ranked list of targetable regions with explicit constraints, evidence summaries, risk flags, and a traceable decision log. Rather than requiring users to specify the drug modality upfront, Site4Drug can recommend a binding modality (e.g., antibody/peptide-like vs small-molecule) from the same evidence used for site discovery, including topology, hydropathy, PTM propensity, disulfides, domain context, and sequence. Importantly, this evidence is applied consistently across modalities, including small-molecule pocket discovery, to avoid selecting chemically plausible but biologically occluded sites.

URL PDF HTML ☆

赞 0 踩 0

2606.01783 2026-06-02 cs.IR cs.AI

Breaking the Information Silo: Semantic Personas for Cross-Domain Recommendation

打破信息孤岛：面向跨域推荐的语义人物画像

Jonathan Mayo, Moshe Unger, Konstantin Bauman

发表机构 * Technology and Information Management Department, Coller School of Management, Tel Aviv University（技术与信息管理系，科勒管理学院，特拉维夫大学）； Management Information Systems Department, Fox School of Business, Temple University（管理信息系统系，福克斯商学院， Temple大学）

AI总结提出SPHERE方法，利用大语言模型生成语义人物画像，实现无共享用户或物品的跨域推荐，并通过双塔架构和动态融合门增强推荐性能。

详情

AI中文摘要

数字平台日益成为孤立的信息孤岛，限制了它们跨域构建全面用户表征的能力。跨域推荐系统试图通过将知识从源域迁移到目标域来克服这一限制，但大多数现有方法依赖于共享用户、共享物品或结构相似的交互图。这些假设在独立平台上往往不切实际。我们提出SPHERE（面向异构跨域推荐的语义人物画像），一种设计构件，能够在严格不相交的域之间实现推荐知识迁移，无需共享用户或物品。SPHERE不通过身份或图结构对齐域，而是使用大语言模型诱导共享行为词汇，为用户生成结构化语义人物画像，并检索行为相似的源域社区，形成社区源人物画像。该语义信号通过双塔架构和动态融合门与协同信号集成，使SPHERE能够增强标准推荐骨干。在Amazon Books、Goodreads和Steam上的实证评估表明，在全排名评估下，SPHERE在NCF、SVD++和LightGCN基线上取得了一致的改进。结果表明，跨域迁移效果不仅由域之间的语义接近度决定，还关键取决于目标域的结构密度和原生预测强度。该研究通过将跨域个性化重新定义为基于行为的语义对齐，为信息系统研究做出贡献，提供了一种在保持可解释性和模块化的同时克服信息孤岛的实用机制。

英文摘要

Digital platforms increasingly operate as isolated information silos, limiting their ability to construct comprehensive user representations across domains. Cross-domain recommender systems seek to overcome this limitation by transferring knowledge from a source domain to a target domain, yet most existing approaches depend on shared users, shared items, or structurally similar interaction graphs. These assumptions are often unrealistic across independent platforms. We propose SPHERE (Semantic Personas for Heterogeneous cross-domain Recommendation), a design artifact that enables recommendation knowledge transfer across strictly disjoint domains with no shared users or items. Rather than aligning domains through identity or graph structure, SPHERE uses large language models to induce a shared behavioral vocabulary, generate structured semantic personas for users, and retrieve behaviorally similar source-domain communities that form a Community Source Persona. This semantic signal is integrated with collaborative signals through a dual-tower architecture and dynamic fusion gate, allowing SPHERE to augment standard recommender backbones. Empirical evaluation across Amazon Books, Goodreads, and Steam demonstrates consistent improvements over NCF, SVD++, and LightGCN baselines under full-ranking evaluation. The results show that cross-domain transfer effectiveness is not determined solely by semantic proximity between domains; rather, it depends critically on the structural density and native predictive strength of the target domain. The study contributes to information systems research by reframing cross-domain personalization as behavior-based semantic alignment, offering a practical mechanism for overcoming information silos while preserving interpretability and modularity.

URL PDF HTML ☆

赞 0 踩 0

2606.01764 2026-06-02 math.OC cs.GT cs.LG

Accelerating Min-Max Optimization via Power-Law Stepsizes

通过幂律步长加速极小极大优化

Yue Wu, Weiqiang Zheng, Yang Cai, Haipeng Luo

发表机构 * University of Southern California（南加州大学）； Yale University（耶鲁大学）

AI总结本文提出确定性动态步长调度，将外梯度方法的最后迭代收敛率从Θ(T^{-1/2})加速到O(T^{-2/3+ε})，并通过分离外推和更新步长进一步达到近最优的O(T^{-1+ε})。

Comments 56 pages

详情

AI中文摘要

我们重新审视了无约束双仿射极小极大优化的外梯度（EG）方法的收敛保证。已知固定步长的EG实现了$Θ(T^{-1/2})$的最后迭代收敛率，这比通过引入锚定等额外机制可达到的最优$\mathcal{O}(T^{-1})$率要慢。受最近进展（动态步长本身可以显著加速梯度下降）的启发，我们询问动态步长是否也能类似地加速EG的最后迭代收敛。我们在此方向上给出了第一个正面结果。具体地，我们提供了一个确定性动态步长调度，将EG的收敛率加速到$\mathcal{O}(T^{-2/3+\varepsilon})$，对于任意$\varepsilon > 0$。我们还证明，当EG的外推和更新步使用相同步长时，该率是紧的。然后我们表明，允许外推和更新步使用不同步长进一步将收敛率提高到近最优的$\mathcal{O}(T^{-1+\varepsilon})$。我们的分析将步长调度简化为一个优化问题，其解导致遵循幂律分布（的离散化）的步长调度。我们提出的步长调度和分析可扩展到其他方法，如乐观梯度（OG），并表明对一般极小极大优化问题的更广泛适用性。

英文摘要

We revisit the convergence guarantees of the Extragradient (EG) method for unconstrained biaffine min-max optimization. It is known that EG with a fixed stepsize achieves a $Θ(T^{-1/2})$ last-iterate convergence rate, which is slower than the optimal $\mathcal{O}(T^{-1})$ rate attainable by incorporating additional mechanisms such as anchoring. Motivated by recent advances showing that dynamic stepsizes alone can significantly accelerate gradient descent, we ask whether dynamic stepsizes can similarly accelerate the last-iterate convergence of EG. We present the first positive result in this direction. Specifically, we provide a deterministic dynamic stepsize schedule that accelerates the convergence rate of EG to $\mathcal{O}(T^{-2/3+\varepsilon})$ for any $\varepsilon > 0$. We also show that this rate is tight when the extrapolation and update steps of EG use the same stepsize. We then show that allowing different stepsizes for the extrapolation and update steps further improves the convergence rate to the near-optimal $\mathcal{O}(T^{-1+\varepsilon})$. Our analysis reduces stepsize scheduling to an optimization problem, whose solution leads to a stepsize schedule that follows (a discretization of) a power-law distribution. Our proposed stepsize schedules and analysis extend to other methods, such as Optimistic Gradient (OG), and suggest broader applicability to general min-max optimization problems.

URL PDF HTML ☆

赞 0 踩 0

2606.01741 2026-06-02 cs.CR cs.AI

SECUREVENT: Hybrid AI/ML Security Monitoring for Distributed Event-Based Systems

SECUREVENT: 面向分布式事件系统的混合AI/ML安全监控

Eric Liang

发表机构 * Oracle

AI总结提出SECUREVENT架构，结合传统安全机制与在线异常检测、图行为特征、复杂事件策略、联邦学习和对抗ML治理，通过混合AI/CEP监控提高召回率并保持低误报率。

详情

AI中文摘要

分布式事件系统已成为互联网规模发布/订阅服务、物联网遥测、云原生微服务和安全运营管道的常见基础。它们的松散耦合和异步交付提高了可扩展性，但也扩大了攻击面：发布者、代理、订阅者、主题、模式和时间顺序都可能被滥用，而没有一个组件能观察整体行为。本文提出了SECUREVENT，一种用于分布式事件系统的混合AI/ML安全监控架构。该架构将传统保护（如认证传输、主题级授权和签名事件）与在线异常检测、图感知行为特征、复杂事件策略规则、联邦学习和对抗ML治理相结合。对合成事件流攻击的确定性原型研究表明，混合AI/CEP监控可以在保持低误报率的同时提高静态规则的召回率。核心主张并非机器学习取代密码学和访问控制机制，而是当事件流、身份、模式和时间关系过于动态以至于静态控制无法单独应对时，基于模型的安全监控是必要的。

英文摘要

Distributed event-based systems have become a common substrate for Internet-scale publish/subscribe services, IoT telemetry, cloud-native microservices, and security operations pipelines. Their loose coupling and asynchronous delivery improve scalability, but they also expand the attack surface: publishers, brokers, subscribers, topics, schemas, and temporal ordering can each be abused without a single component observing the whole behavior. This paper proposes SECUREVENT, a hybrid AI/ML security-monitoring architecture for distributed event-based systems. The architecture combines traditional protections such as authenticated transport, topic-level authorization, and signed events with online anomaly detection, graph-aware behavioral features, complex-event policy rules, federated learning, and adversarial-ML governance. A deterministic prototype study over synthetic event-stream attacks illustrates how a hybrid AI/CEP monitor can improve recall over static rules while retaining a low false-positive rate. The central claim is not that machine learning replaces cryptographic and access-control mechanisms, but that model-based security monitoring is necessary when event flows, identities, schemas, and timing relationships are too dynamic for static controls alone.

URL PDF HTML ☆

赞 0 踩 0

2606.01702 2026-06-02 cs.GR cs.LG

KDH-CAD: Knowledge-data hybrid CAD learning under data scarcity

KDH-CAD：数据稀缺下的知识-数据混合CAD学习

Ziqin Gao, Zhijie Yang, Qiang Zou

发表机构 * State Key Laboratory of CAD \& CG, Zhejiang University, Hangzhou, 310027, China

AI总结提出KDH-CAD框架，融合预训练基础模型、结构化领域知识和少量标注CAD数据，在数据稀缺下实现高效机械零件分类，准确率达92.6%（250样本）和95.8%（1000样本）。

Comments 18 pages

详情

AI中文摘要

计算机辅助设计（CAD）中的深度学习仍然受到数据稀缺挑战的根本制约：真实的CAD数据难以大规模收集，而合成数据可能无法真实反映实际设计实践。本文不追求更大的CAD数据集，而是将CAD学习视为知识补全和校准问题。它引入了KDH-CAD，一个知识-数据混合框架，该框架整合了基础模型中的预训练知识、教科书/教程中的结构化领域知识以及非常少量的标注CAD数据。领域知识用于引出和补全在预训练基础模型中表达较弱或代表性不足的CAD相关概念，而标注CAD数据则在潜在空间中校准这些概念，以考虑特定任务的几何变异性，而无需微调基础模型。在真实机械零件分类上的实验表明，KDH-CAD在低数据场景下取得了强劲性能，仅用250个训练样本就达到92.6%的准确率，用1000个样本达到95.8%，并且随着数据增加持续提升。这匹配或超过了通常需要多一个数量级数据的现有最优性能。这些结果表明，将预训练基础模型与结构化领域知识相结合可以大幅减少对大规模CAD数据集的依赖，为数据高效的CAD学习提供了原则性和实用性的方向。

英文摘要

Deep learning in computer-aided design (CAD) remains fundamentally constrained by the data scarcity challenge: authentic CAD data is difficult to collect at scale, while synthetic data may not faithfully reflect real design practice. Rather than pursuing ever-larger CAD datasets, this paper alternatively treats CAD learning as a knowledge completion and calibration problem. It introduces KDH-CAD, a knowledge-data hybrid framework that integrates pretrained knowledge in foundation models, structured domain knowledge from textbooks/tutorials, and a very small amount of labeled CAD data. Domain knowledge is used to elicit and complete CAD-relevant concepts that are weakly expressed or under-represented in pretrained foundation models, while labeled CAD data calibrates these concepts in the latent space to account for task-specific geometric variability, without fine-tuning the foundation model. Experiments on real-world mechanical part classification show that KDH-CAD achieves strong performance in low-data regimes, reaching 92.6\% accuracy with only 250 training samples, 95.8\% with 1,000 samples, and continuing to improve with additional data. This matches or exceeds state-of-the-art performance that typically requires an order of magnitude more data. These results suggest that combining pretrained foundation models with structured domain knowledge can substantially reduce reliance on large-scale CAD datasets, providing a principled and practical direction for data-efficient CAD learning.

URL PDF HTML ☆

赞 0 踩 0

2606.01691 2026-06-02 cs.CR cs.LG

IstGPT: LLM-based Anomaly Detection for Spatial-Temporal Graph in Industrial Systems

IstGPT：基于LLM的工业系统时空图异常检测

Yuchen Zhang, Ning Xi, Pengbin Feng, Shigang Liu, Jianfeng Ma, Yulong Shen, Yanan Sun, Xiaolin Zhou

发表机构 * School of Cyber Engineering, Xidian University（电子科技大学信息工程学院）； School of Science, Computing and Engineering Technologies, Swinburne University of Technology（斯winburne技术大学科学与工程技术学院）； School of Computer Science and Technology, Xidian University（电子科技大学计算机科学与技术学院）

AI总结提出IstGPT，首个结合大语言模型与图学习的工业异常检测工具，通过多模态知识提取传感器-执行器依赖图并利用改进的图神经网络实现实时异常检测，在9个数据集上取得最佳F1分数和eTaF1指标。

详情

AI中文摘要

工业互联网系统面临来自复杂工业控制系统（ICS）攻击的日益增长的威胁，导致严重的安全事件。然而，由于传感器和执行器之间的复杂依赖关系，现有工具在实时异常检测方面效果有限。为了解决这个问题，我们提出了IstGPT，这是首个基于大语言模型和图学习的工业异常检测工具，能够针对广泛的ICS攻击提供实时保护。IstGPT实现了对工业信息物理系统中时空依赖关系的细粒度精确建模。它首先利用工业多模态知识，包括操作数据、技术文档和系统图，通过多阶段提示工程提取传感器-执行器依赖图。然后，LLM-Optimation基于节点准确性、边缘一致性和逻辑连贯性迭代优化图。最后，IstGPT将改进的图神经网络与编码器-解码器架构相结合，通过重构误差检测异常。我们在9个数据集上评估了IstGPT与12个最先进基线模型的性能，包括2个公共数据集、6个模拟数据集和一个真实机器人手臂数据集。IstGPT在所有九个数据集上取得了最佳的F1分数和eTaF1（一种较新的时间感知指标）。我们进一步讨论了在真实工业场景中部署IstGPT的可行性。

英文摘要

Industrial Internet systems face increasing threats from sophisticated industrial control system (ICS) attacks, resulting in critical safety incidents. However, existing tools exhibit limited effectiveness in real-time anomaly detection due to the complex dependencies among sensors and actuators. To tackle this, we present IstGPT, the first industrial anomaly detection tool based on LLMs and graph learning to provide real-time protection against a wide range of ICS attacks. IstGPT achieves fine-grained and precise modeling on spatial-temporal dependencies in industrial cyber-physical systems. It first leverages industrial multi-modal knowledge, including operational data, technical documents, and system diagrams, to extract sensor-actuator dependency graphs via multi-stage prompt engineering. Then, LLM-Optimation iteratively refines the graph based on node accuracy, edge consistency, and logical coherence. Finally, IstGPT integrated improved graph neural networks with an encoder-decoder architecture to detect anomalies via reconstruction errors. We evaluate IstGPT against 12 state-of-the-art baselines on 9 datasets, including 2 public, 6 simulated, and a real-world robotic arm dataset. IstGPT achieves the best F1-scores and eTaF1 (a newer time-aware metric) across nine datasets. We further discuss the feasibility of deploying IstGPT in real-world industrial scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.01680 2026-06-02 cs.DC cs.LG cs.NI

Don't Let a Few Network Failures Slow the Entire AllReduce

不要让少数网络故障拖慢整个 AllReduce

Peiqing Chen, Jiedong Jiang, Nengneng Yu, Yuefeng Wang, Sixian Xiong, Wei Wang, Zaoxing Liu

发表机构 * University of Maryland, College Park（马里兰大学学院公园分校）； Utrecht University（乌特雷赫大学）； Kyoto University（京都大学）

AI总结针对网络故障导致 AllReduce 性能下降的问题，提出基于信息论下界的 OptCC 算法，通过四阶段流水线设计在带宽损失高达 50% 时仍接近无故障性能。

详情

AI中文摘要

网络故障是大规模 GPU 集群中最常见的硬件故障之一，也是训练任务中断的主要原因。现代集体通信库（如 NCCL）通过将流量重新路由到同一服务器上幸存的 NIC 来缓解网络故障，以降低节点间带宽换取不间断训练。然而，降级后的服务器仍处于标准环形算法的关键路径上，拖慢了整个集体通信。我们首次给出了非对称网络带宽下 AllReduce 完成时间的信息论下界，并表明当落后者保留至少一半原始带宽时，相对于无故障最优值的不可避免开销仅为 O(1/p)（p 为 GPU 数量）。然后，我们设计了 OptCC，一种接近该下界的四阶段流水线 AllReduce 算法。SimAI 上的实验证实，OptCC 缩小了现有容错方案留下的差距：在实际网络故障（带宽损失高达 50%）下，OptCC 的 AllReduce 完成时间在 NCCL 无故障环形性能的 2-6% 以内，而现有最优方案的开销高达 57%。

英文摘要

Network failures are among the most frequent hardware faults in large-scale GPU clusters and a leading cause of training-job interruptions. Modern collective communication libraries such as NCCL mitigate network failures by rerouting traffic through surviving NICs on the same server, trading reduced inter-node bandwidth for uninterrupted training. However, the degraded server remains on the critical path of the standard ring algorithm, slowing the entire collective. We present the first information-theoretic lower bound on AllReduce completion time under asymmetric network bandwidth and show that when the straggler retains at least half of its original bandwidth, the unavoidable overhead relative to the fault-free optimum is only O(1/p) for p GPUs. We then design OptCC, a four-stage pipelined AllReduce algorithm that approaches this lower bound. Experiments on SimAI confirm that OptCC closes the gap left by existing fault-tolerant schemes: under practical network failures with up to 50% bandwidth loss, OptCC completes AllReduce within 2-6% of NCCL's fault-free ring performance, whereas the state-of-the-art incurs up to 57% overhead.

URL PDF HTML ☆

赞 0 踩 0

2606.01670 2026-06-02 cs.IR cs.AI

Time-Aware Diffusion based on Preference Disentanglement for Generative Recommendation

基于偏好解耦的时间感知扩散用于生成式推荐

Bangguo Zhu, Peng Huo, Yuanbo Zhao, Zhicheng Du, Jun Yin, Senzhang Wang

发表机构 * Central South University（中南大学）； National Super Computing Center（国家超算中心）； Renmin University of China（中国人民大学）； Hong Kong Polytechnic University（香港理工大学）

AI总结针对现有扩散生成式推荐模型忽略用户偏好时间非平稳分布的问题，提出TDPM框架，通过将用户偏好解耦为长期周期偏好和短期点状偏好并融入扩散过程，在三个数据集上HR@20和NDCG@20平均提升29.21%和25.45%。

详情

AI中文摘要

最近，生成式推荐（GRs）通过用语义索引（SIDs）取代传统项目ID，成为一种变革性的推荐范式。由于扩散模型卓越的生成能力，一些开创性工作探索了以扩散架构为骨干开发GRs。然而，现有基于扩散的GRs的一个致命限制是扩散过程统一应用于历史交互中的所有项目。相比之下，用户偏好由多方面的时变因素塑造，因此在时间维度上呈现非平稳分布。为弥补这一差距，本研究提出一种新颖的GR框架，名为TDPM，通过在SID令牌上设计时间感知扩散。具体而言，TDPM将时变用户偏好的影响明确整合到扩散过程中。详细地，用户偏好被解耦为（i）长期一致的周期偏好和（ii）由近期焦点事件触发的点状偏好。在三个公开真实数据集上的大量实验表明，TDPM显著优于最先进的基线模型。TDPM在HR@20和NDCG@20上分别实现了平均高达29.21%和25.45%的提升。消融研究进一步强调了基于扩散的GRs中时间感知令牌扩散的必要性。

英文摘要

Recently, Generative Recommenders (GRs) have emerged as a transformative recommendation paradigm by replacing traditional item IDs with semantic indices (SIDs). Owing to the exceptional generative capabilities of diffusion models, a few pioneering works explore developing GRs with diffusion architectures as the backbone. However, a fatal limitation of existing diffusion-based GRs is that the diffusion process applies uniformly to all items within the historical interactions. In contrast, the user preference is shaped by multifaceted time-evolving factors and thus exhibits a non-stationary distribution in the temporal aspect. To bridge this gap, this study proposes a novel GR framework, named TDPM, by designing the time-aware diffusion on SID tokens. Specifically, TDPM explicitly integrates the impact of time-evolving user preferences into the diffusion process. In detail, the user preference is disentangled into (i) the period preference, which remains consistent over a long time-span, and (ii) the point preference, which is triggered by recent focal events. Extensive experiments on three public real-world datasets demonstrate the significant superiority of TDPM over the state-of-the-art baselines. TDPM achieves average improvements of up to 29.21% and 25.45% in terms of HR@20 and NDCG@20, respectively. The ablation study further underscores the necessity of time-aware token diffusion in diffusion-based GRs.

URL PDF HTML ☆

赞 0 踩 0

2606.01655 2026-06-02 math.OC cs.AI cs.LG stat.ML

MINTS: Minimalist Thompson Sampling

MINTS: 极简汤普森采样

Kaizheng Wang

发表机构 * Department of IEOR and Data Science Institute, Columbia University（工业工程与数据科学学院，哥伦比亚大学）

AI总结针对贝叶斯方法在复杂结构约束下的局限性，提出一种仅对最优位置设置先验、通过轮廓似然消除冗余参数的极简贝叶斯框架，并实例化为MINTS算法，在均值约束多臂老虎机中实现近最优非渐近遗憾保证和精确几乎必然渐近遗憾刻画。

Comments 29 pages

2606.01652 2026-06-02 eess.SP cs.CV

Physics-Aware Linearized ADMM and Its Unrolling

物理感知线性化ADMM及其展开

Satoshi Takabe, Shunta Arai, Tadashi Wadayama

发表机构 * Japan Society for the Promotion of Science (JST), CRONOS（日本学术振兴会（JST）、CRONOS）

AI总结针对基于PDE测量过程的逆问题，提出物理感知线性化ADMM算法，通过子问题线性化实现高效更新，并利用深度展开训练内部参数，在光纤通信压缩感知和噪声各向异性扩散图像恢复中验证有效性。

Comments 5 pages, 3 figures

2606.01645 2026-06-02 stat.ML cs.LG

Self-Regulating Annealing in Heavy-Tailed Diffusion Models

重尾扩散模型中的自调节退火

Keito Wakatsuki, Hideaki Shimazaki

发表机构 * Keito Wakatsuki（凯托·瓦卡苏基）； Hideaki Shimazaki

AI总结本文提出一种基于随机微分方程的重尾扩散模型采样器，通过状态依赖的扩散系数实现自调节退火机制，以改进重尾数据的生成保真度。

Comments 6 pages, 3 figures, IJCNN2026

2606.01632 2026-06-02 cs.GT cs.AI

A Framework for Graph-Conditioned Hierarchical Shapley Attribution in Patent Valuation

图条件分层Shapley归因专利估值框架

Joy Bose

发表机构 * Independent Researcher（独立研究者）

AI总结提出PatentXAI框架，将专利估值建模为可解释AI问题，利用知识图谱中的马尔可夫毯限制联盟规模，通过分层Shapley值实现高效且可解释的利润分配。

详情

AI中文摘要

估计一个包含数万项专利的产品中单项专利的经济贡献是知识产权经济学中一个长期未解决的问题。我们提出PatentXAI，一个将专利估值视为可解释AI问题的框架：给定一个特征函数v(S)编码专利子集S可实现的收入，专利的Shapley值以满足效率、对称性、虚拟性和可加性的方式衡量其对产品利润的公平份额。为了使计算可行，我们将每个专利的联盟限制在知识图谱中的马尔可夫毯内，基于C-SVE条件独立定理（Li et al., 2020）。使用帕累托分布覆盖图从n=12到n=100项专利的规模实验报告，在n=100时中位马尔可夫毯大小为n的32.9%，90百分位毯大小为n的55.2%，每项专利运行时间为10毫秒。与n=12时精确真实值的差异为0.088；与n=100时高样本蒙特卡洛参考值的差异为0.062±0.003。一个密集组件实验表明，当80%的专利共享一个组件时，毯正确扩展以覆盖该密集簇，与参考值的差异降至0.039，因为合并计算在同质组合上变得更准确。利润分配分层进行：精确Shapley将总利润分配给宏观组件，然后中心性加权Shapley将每个组件预算分配给覆盖专利。从真实数据估计v(S)是主要的开放问题；我们将此与计算贡献区分开来，并概述了使用公共ETSI、USPTO和Lens.org数据集进行实证验证的具体路线图。

英文摘要

Estimating the economic contribution of a single patent inside a product that embodies tens of thousands of patents is a long-standing unsolved problem in intellectual property economics. We propose PatentXAI, a framework that treats patent valuation as a problem of explainable AI: given a characteristic function v(S) encoding the revenue achievable by patent subset S, a patent's Shapley value measures its fair share of product profit in a way that satisfies efficiency, symmetry, dummy, and additivity. To make computation tractable we restrict each patent's coalition to its Markov Blanket inside a knowledge graph, grounded in the C-SVE conditional independence theorem (Li et al., 2020). Scaling experiments from n=12 to n=100 patents using Pareto-distributed coverage graphs report median Markov Blanket size of 32.9 percent of n at n=100, with 90th-percentile blanket size of 55.2 percent of n, and runtime of 10 milliseconds per patent. Difference against exact ground truth at n=12 is 0.088; difference against a high-sample Monte Carlo reference at n=100 is 0.062 plus or minus 0.003. A dense-component experiment shows that when 80 percent of patents share one component, the blanket correctly expands to cover that dense cluster, and the difference versus reference falls to 0.039 because the pooled computation becomes more accurate on homogeneous portfolios. Profit allocation proceeds hierarchically: exact Shapley distributes total profit among macro-components, then centrality-weighted Shapley distributes each component budget among covering patents. Estimating v(S) from real data is the primary open problem; we distinguish this from the computational contribution and outline a concrete roadmap for empirical validation using public ETSI, USPTO, and Lens.org datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.01628 2026-06-02 q-bio.BM cs.AI

Demystifying Multimodal Biomolecular Co-design With Intrinsic Geodesic Coupling

揭示具有内在测地耦合的多模态生物分子协同设计

Keyue Qiu, Xintong Wang, Zhilong Zhang, Hao Zhou, Wei-Ying Ma

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结针对生物分子协同设计中模态间时间耦合被忽视的问题，提出GeoCoupling框架优化异构模态的时间耦合，在基于结构的药物设计和无条件蛋白质设计中提升物理有效性和多样性。

Comments Accepted to ICML 2026

详情

AI中文摘要

蛋白质和小分子配体等生物分子在生物系统中发挥核心作用，这源于序列与三维结构之间的紧密相互作用。最近的生物分子协同设计生成模型旨在通过联合建模耦合模态来捕捉这种相互作用。然而，现有方法大多采用并行执行边际生成过程，隐式地强制固定同步耦合。我们认为，一个关键但被忽视的自由度在于这些边际过程在训练和生成过程中如何时间耦合，不恰当的耦合会引入高方差监督和不一致的中间状态，影响模态一致性。为了解决这个问题，我们引入了GeoCoupling，一个优化异构模态之间时间耦合的系统框架。在基于结构的药物设计和无条件蛋白质设计上的实证结果表明，学习到的耦合始终优于同步和随机耦合基线，产生了具有改进的物理有效性和多样性的生物分子。

英文摘要

Biomolecules such as proteins and small-molecule ligands play a central role in biological systems, arising from the tight interplay between sequence and three-dimensional structure. Recent generative models for biomolecular co-design aim to capture this interplay by jointly modeling coupled modalities. However, existing approaches largely adopt a parallel execution of marginal generative processes, implicitly enforcing fixed synchronous coupling. We argue that a critical but overlooked degree of freedom lies in how these marginal processes are temporally coupled during training and generation, where inappropriate coupling can introduce high-variance supervision and inconsistent intermediate states, affecting modality consistency. To address this, we introduce GeoCoupling, a systematic framework that optimizes for temporal couplings between heterogeneous modalities. Empirical results across structure-based drug design and unconditional protein design demonstrate the learned couplings consistently outperform synchronous and randomly coupled baselines, yielding biomolecules with improved physical validity and diversity.

URL PDF HTML ☆

赞 0 踩 0

2606.01596 2026-06-02 math.NA cs.LG cs.NA

Learning Chaotic Dynamics through Second-Order Geometric Supervision

通过二阶几何监督学习混沌动力学

Shinhoo Kang, Hai V. Nguyen, Tan Bui-Thanh

发表机构 * Department of Computer Science and Software Engineering, Korea University（韩国大学计算机科学与软件工程系）； Department of Aerospace Engineering and Engineering Mechanics, The Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin（德克萨斯大学奥斯汀分校航空航天工程与工程力学系，奥登计算工程与科学研究所）

AI总结提出模型约束随机雅可比匹配方法，以O(d^2)代价隐式施加二阶一致性，在混沌系统中恢复吸引子几何和不变统计量。

Comments 37 pages, 15 figures, 6 tables

详情

AI中文摘要

从数据中学习混沌动力系统需要的不仅仅是短期预测精度：学习模型必须保持吸引子几何及其不变统计量。轨迹（零阶）和雅可比（一阶）匹配监督向量场的值和切结构，但两者都不约束场如何偏离其切平面。因此，模型可以在监督状态下匹配值和切线，但弯曲方式与真实情况不同，在保持局部精度的同时，向虚假吸引子漂移并扭曲长时间统计量。我们证明，强制二阶一致性可以减轻这些失败，但在高维中形成完整的Hessian矩阵是禁止的。我们提出模型约束随机雅可比匹配，该方法在随机扰动的输入处比较真实和学习的向量场的雅可比矩阵。泰勒展开表明，期望的随机雅可比损失分解为名义雅可比失配加上由噪声方差缩放的Hessian失配，从而以O(d^2)代价隐式施加二阶一致性，而无需形成O(d^3)的Hessian张量。仅使用雅可比评估，该方法可扩展到显式Hessian匹配无法实现的高维。数值实验证实二阶方法是稳健的。对于Lorenz~63，一阶方法在最小时间监督下产生灾难性的Lyapunov指数异常值，而二阶方法消除了这些异常值并恢复了正确的吸引子。对于耦合Lorenz~96，分布外强迫扫描区分了这些方法：所有方法在F=16之前一致，但超过F=18后，只有二阶方法保持了不变测度和Lyapunov谱。在两个系统上，随机雅可比匹配以低得多的成本实现了与显式Hessian匹配相当的性能。

英文摘要

Learning chaotic dynamical systems from data requires more than short-term predictive accuracy: the learned model must preserve the attractor geometry and its invariant statistics. Trajectory (zero-order) and Jacobian (first-order) matching supervise the values and tangent structure of the vector field, but neither constrains how the field bends away from its tangent plane. A model can thus match values and tangents at the supervised states yet curve differently from the truth, remaining locally accurate while drifting toward spurious attractors and distorting long-time statistics. We show that enforcing second-order consistency mitigates these failures, but forming the full Hessian is prohibitive in high dimensions. We propose model-constrained randomized Jacobian matching, which compares the Jacobians of the true and learned vector fields at randomly perturbed inputs. A Taylor expansion shows that the expected randomized Jacobian loss decomposes into the nominal Jacobian mismatch plus a Hessian mismatch scaled by the noise variance, implicitly enforcing second-order consistency at $\mathcal{O}(d^2)$ cost without forming the $\mathcal{O}(d^3)$ Hessian tensor. Using only Jacobian evaluations, the method scales to high dimensions where explicit Hessian matching does not. Numerical experiments confirm that second-order methods are robust. For Lorenz~63, first-order methods produce catastrophic Lyapunov-exponent outliers under minimal temporal supervision, which second-order methods eliminate while recovering the correct attractor. For coupled Lorenz~96, an out-of-distribution forcing sweep separates the methods: all agree up to $F=16$, but beyond $F=18$ only second-order methods preserve the invariant measure and Lyapunov spectrum. On both systems, randomized Jacobian matching performs comparably to explicit Hessian matching at much lower cost.

URL PDF HTML ☆

赞 0 踩 0

2606.01578 2026-06-02 eess.AS cs.SD

Description and Discussion on DCASE 2026 Challenge Task 2: Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

DCASE 2026挑战任务2：面向机器状态监测的噪声感知无监督异常声音检测——描述与讨论

Tomoya Nishida, Noboru Harada, Daiki Takeuchi, Daisuke Niizumi, Keisuke Imoto, Kota Dohi, Harsh Purohit, Takashi Endo, Yohei Kawaguchi

发表机构 * National Institute of Information and Communications Technology, Japan（日本信息与通信技术研究院）

AI总结本文介绍DCASE 2026挑战任务2，通过利用近远双通道音频分离环境噪声与机器声音，提升无监督异常声音检测在噪声条件下的鲁棒性。

Comments this article draws heavily from arXiv:2506.10097

详情

AI中文摘要

本文概述了DCASE 2026挑战任务2，题为“面向机器状态监测的噪声感知无监督异常声音检测”。该任务旨在推进无监督设置下机器状态监测的噪声鲁棒异常声音检测，其中仅使用正常机器声音进行训练。在噪声条件下进行可靠检测对于实际部署至关重要，但以往的DCASE任务2设置提供的环境噪声信息有限，可能限制了高噪声情况下的UASD性能。为解决这一限制，DCASE 2026允许参与者利用同时在目标机器附近和远处采集的双通道音频样本。由于远处的麦克风预计包含相对更强的环境噪声和更弱的直接机器声音，它可能有助于从目标机器声音中区分环境噪声成分。在挑战提交截止日期后，将添加挑战结果和提交系统的分析。

英文摘要

This paper presents an overview of DCASE 2026 Challenge Task 2, titled "Noise-aware unsupervised anomalous sound detection (UASD) for machine condition monitoring." The task aims to advance noise-robust anomalous sound detection for machine condition monitoring under the unsupervised setting, where only normal machine sounds are available for training. Reliable detection under noisy conditions is crucial for practical deployment, but previous DCASE Task 2 settings provided limited information about environmental noise, potentially limiting UASD performance in highly noisy situations. To address this limitation, DCASE 2026 allows participants to exploit two-channel audio samples simultaneously captured at locations near and far from the target machine. Since the distant microphone is expected to contain relatively stronger environmental noise and weaker direct machine sounds, it may help distinguish environmental noise components from the target machine sounds. After the challenge submission deadline, challenge results and an analysis of the submitted systems will be added.

URL PDF HTML ☆

赞 0 踩 0

2606.01572 2026-06-02 eess.IV cs.CV

PINNOCHIO: Physics-Informed Neural Network for Coupled Hyperelastic Interface-Volume Simulation in Orthognathic Surgery

PINNOCHIO: 用于正颌手术中耦合超弹性界面-体积模拟的物理信息神经网络

Jungwook Lee, Daeseung Kim, Kevin Gu, Zhangfeng Hu, Tianshu Kuang, Finn Hopeman, Michael A. K. Liebschner, Jaime Gateno, Pingkun Yan

发表机构 * Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute（生物医学工程系和生物技术与跨学科研究中心，伦塞拉尔理工学院）； Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute（口腔颌面外科系，休斯顿方法主义研究学院）； Department of Neurosurgery, Baylor College of Medicine（神经外科系，贝勒医学院）

AI总结提出PINNOCHIO框架，通过混合顺序分解解耦不连续骨-软组织界面运动与连续体积超弹性变形，实现稳定训练和物理启发的模拟到真实适应策略，在40名患者队列中优于现有基线，解决了精度-效率权衡问题。

Comments This work has been submitted to MICCAI 2026

详情

AI中文摘要

预测患者特定面部软组织变形对于迭代正颌手术规划至关重要。然而，当前计算方法面临严格的精度-效率权衡：高保真有限元方法计算成本过高，而纯深度学习模型往往产生生物力学不一致的结果。尽管物理信息神经网络提供了一条有前景的途径，但在仅有部分临床监督（即外表面）下学习骨-软组织相互作用的复杂异质力学仍然高度不稳定。为克服这些挑战，我们提出了PINNOCHIO，一种用于面部软组织模拟的新型物理信息框架。PINNOCHIO引入了一种混合顺序分解，明确地将不连续的骨-软组织界面运动与连续的体积超弹性变形解耦。这种结构分离实现了稳定训练，并促进了物理启发的模拟到真实适应策略，确保内部生物力学一致性而无需体积真实数据。在40名患者临床队列上的评估表明，PINNOCHIO在表面精度和物理有效性方面均优于现有基线。此外，它实现了比有限元方法显著的加速，成功解决了精度-效率权衡，为交互式手术规划提供了高度可靠和实用的工具。

英文摘要

Predicting patient-specific facial soft-tissue deformation is critical for iterative orthognathic surgery planning. However, current computational methods face a strict accuracy-efficiency trade-off: high-fidelity Finite Element Methods (FEM) are computationally prohibitive, whereas pure deep learning models often produce biomechanically inconsistent results. While Physics-Informed Neural Networks (PINNs) offer a promising avenue, learning the complex heterogeneous mechanics of bone--soft-tissue interactions with only partial clinical supervision (i.e., outer facial surfaces) remains highly unstable. To overcome these challenges, we present PINNOCHIO, a novel physics-informed framework for facial soft-tissue simulation. PINNOCHIO introduces a hybrid sequential decomposition that explicitly decouples discontinuous bone--soft-tissue interface movements from continuous volumetric hyperelastic deformation. This structural separation enables stable training and facilitates a physics-enabled sim-to-real adaptation strategy, ensuring internal biomechanical consistency without requiring volumetric ground truth. Evaluated on a 40-patient clinical cohort, PINNOCHIO outperforms existing baselines in both surface accuracy and physical validity. Furthermore, it achieves a substantial speedup over FEM, successfully resolving the accuracy-efficiency trade-off to provide a highly reliable and practical tool for interactive surgical planning.

URL PDF HTML ☆

赞 0 踩 0

2606.01542 2026-06-02 cs.DC cs.AI cs.CL cs.DB cs.IR

Self-Conditioned Positional HNSW for Overlap-Aware Retrieval in Chunked-Document RAG Systems: Method and Industrial Evidence-Quality Audit

自条件位置HNSW：面向分块文档RAG系统的重叠感知检索方法与工业证据质量审计

Nataraj Agaram Sundar, Tejas Morabia

发表机构 * eBay Inc.（eBay公司）

AI总结提出自条件位置HNSW（SCP-HNSW），通过低维位置编码和两遍查询过程实现重叠感知检索，减少重复证据，并基于工业审计数据验证其有效性。

Comments 11 pages, 5 figures, 4 tables

详情

AI中文摘要

分块文档检索是检索增强生成（RAG）系统的常见组件。文档被分割成重叠的块，嵌入，并使用近似最近邻搜索（如分层可导航小世界图HNSW）进行索引。重叠改善了边界覆盖，但引入了一个实际故障模式：top-k检索通常返回重复证据的相邻块，浪费提示预算。我们提出自条件位置HNSW（SCP-HNSW），这是一种轻量级修改，将低维位置代码附加到块嵌入，并使用两遍查询过程来估计和应用查询特定的文档位置先验。SCP-HNSW保持HNSW图构建和遍历不变，同时为最终上下文构建添加了一个可审计的最小索引间隙选择器。我们还集成了用于生成证据质量的工业审查工件：一个包含318个完全标记审查的770条文本证据审计，以及一个包含350个评级的70例OCR审计。文本审计显示，770个预计审查中有574个被评为3/5，只有39个落在1-2范围内，叙述性审查者细节比结构化问题标志出现得更频繁。OCR审计显示，切片级通过率从干净聊天截图的95%到手写/模糊捕获的45%不等，一致性中等至强。这些结果激励了重叠感知、审计友好的RAG检索，并确定了因果性能声明所需的剩余受控检索消融。

英文摘要

Chunked-document retrieval is a common component of retrieval-augmented generation (RAG) systems. Documents are split into overlapping chunks, embedded, and indexed with approximate nearest-neighbor search such as hierarchical navigable small world graphs (HNSW). Overlap improves boundary coverage but induces a practical failure mode: top-k retrieval often returns near-adjacent chunks that repeat evidence and waste prompt budget. We propose Self-Conditioned Positional HNSW (SCP-HNSW), a lightweight modification that appends a low-dimensional positional code to chunk embeddings and uses a two-pass query procedure to estimate and apply a query-specific document-position prior. SCP-HNSW leaves HNSW graph construction and traversal unchanged while adding an auditable minimum-index-gap selector for final context construction. We also integrate industrial review artifacts for generated evidence quality: a 770-review text-evidence audit with 318 fully labeled reviews and a 70-case OCR audit with 350 ratings. The text audit shows that 574 of 770 projected reviews are rated 3/5, only 39 fall in the 1-2 range, and narrative reviewer detail appears much more often than structured issue flags. The OCR audit shows slice-level pass rates from 95% for clean chat screenshots to 45% for handwritten/blurry captures, with moderate to strong agreement. These results motivate overlap-aware, audit-friendly RAG retrieval and identify the remaining controlled retrieval ablations needed for causal performance claims.

URL PDF HTML ☆

赞 0 踩 0

2606.01539 2026-06-02 stat.ME cs.LG

Scalable Counterfactual Risk Estimation for Rare Events in Longitudinal Data

纵向数据中罕见事件的可扩展反事实风险估计

Xiaohui Yin, Avijit Mitra, Ying Zhou, Kun Chen, Hong Yu

发表机构 * University of Connecticut Storrs（康涅狄格大学斯托尔分校）； University of Massachusetts Amherst（马萨诸塞大学阿默斯特分校）； University of Massachusetts Lowell（马萨诸塞大学洛厄尔分校）

AI总结针对纵向生存数据中罕见事件导致的类不平衡和计算负担问题，提出一种可扩展的子采样与重加权策略，应用于ICE等因果效应估计器，在保持一致性的同时提高稳定性。

Comments Accepted at KDD-2026, 12 pages

详情

AI中文摘要

在大规模观察性研究中，估计时变治疗对生存结果的因果效应在计算上要求很高，尤其是当结果罕见时。虽然基于g公式的方法（如迭代条件期望（ICE）估计器）为纵向因果推断提供了原则性框架，但它们在计算上变得昂贵，特别是当需要基于自助法的方差估计时。此外，每个时间点的结果罕见性会导致严重的类不平衡，从而引发逻辑回归及相关模型的不稳定性和收敛问题。为应对这些挑战，我们提出了一种针对纵向生存数据的原则性子采样与重加权策略，可应用于该场景下的多种现有因果效应估计器，包括ICE估计器。所提方法显著降低了计算负担，同时在罕见结果场景下保持一致性并提高估计稳定性。我们通过模拟评估该方法，并使用一项关于健康社会和行为决定因素（SBDH）与自杀风险的大规模EHR队列研究进行验证，证明了其在纵向数据中建模罕见结果的有效性。

英文摘要

Estimating the causal effect of time-varying treatments on survival outcomes in large observational studies is computationally demanding, particularly when outcomes are rare. While g-formula-based methods such as the iterative conditional expectation (ICE) estimator provide a principled framework for longitudinal causal inference, they become computationally expensive, especially when bootstrap-based variance estimation is required. In addition, outcome rarity at each time point induces severe class imbalance, leading to instability and convergence issues in logistic regression and related models. To address these challenges, we propose a principled subsampling and reweighting strategy for longitudinal survival data that can be applied to a range of existing causal effect estimators in this setting, including the ICE estimator. The proposed method substantially reduces computational burden while preserving consistency and improving estimation stability in rare-outcome settings. We evaluate the method through simulations and validate it using a large-scale EHR cohort study on social and behavioral determinants of health (SBDH) and suicide risk, demonstrating its effectiveness for modeling rare outcomes in longitudinal data.

URL PDF HTML ☆

赞 0 踩 0

2606.01533 2026-06-02 cs.MA cs.CL cs.LG

Multi-Agent Computer Use

多智能体计算机使用

Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结针对单智能体计算机使用代理在复杂长时任务中的不足，提出多智能体计算机使用系统，通过有向无环图分解任务并并行执行，在多个基准测试上提升3.4-25.5%性能，并加速任务完成时间约1.5倍。

详情

AI中文摘要

目前的计算机使用代理（CUA）主要部署为单序列代理。这种设置对于受益于任务分解、并行执行和基于新信息持续重新规划的复杂长时任务来说并不理想。在本文中，我们认为应该转向评估和构建多智能体计算机使用（MACU）系统。这些系统强调规划和并行执行，缓解了单智能体CUA的许多缺点。我们提出了一种通用的多智能体设置，其中管理模型将计算机使用任务分解为有向无环图（DAG），编码子代理的相关依赖关系和目标。在每次迭代中，管理器调度并行的CUA子代理执行DAG就绪前沿上的节点，并根据子代理的新发现持续修订DAG（添加、取消或重写节点）。这种设计将计算机使用的部分可观察环境作为首要挑战：下游代理可能无法重新观察到的信息通过管理器和DAG结构保留并传递。我们证明，MACU在桌面（OSWorld）和网页导航（Online-Mind2Web、WebTailBench、Odysseys）基准测试上始终比强单智能体基线提升3.4-25.5%，表现出更有利的测试时缩放，并解决了单智能体CUA陷入困境的复杂长时任务。在Odysseys（一个长时网页导航基准测试）上，MACU将平均任务完成墙钟时间提高了约1.5倍，证明了其在加速传统缓慢的CUA流程方面的有效性。我们的研究结果强调，多智能体协调是扩展计算机使用代理以更长时间、更有效地工作的一个有前景的方向。我们在https://jykoh.com/multi-agent-computer-use上发布所有代码和交互式可视化。

英文摘要

Computer use agents (CUAs) today are primarily deployed as single serial agents. This setup is suboptimal for complex long-horizon tasks that benefit from task decomposition, parallel execution, and consistent re-planning based on new information. In this paper, we argue that we should instead move towards evaluating and building multi-agent computer use (MACU) systems. These systems, which emphasize planning and parallel execution, alleviate many of the shortcomings of single-agent CUAs. We propose a general multi-agent setup in which a manager model decomposes computer use tasks as a directed acyclic graph (DAG), encoding relevant dependencies and goals for subagents. At each iteration, the manager dispatches parallel CUA subagents to carry out nodes on the ready frontier of the DAG, and continuously revises the DAG (adding, canceling, or rewriting nodes) as new findings arrive from subagents. This design treats the partially observable environment of computer use as a first class challenge: information that downstream agents may not be able to re-observe are retained and passed forward through the manager and DAG structure. We demonstrate that MACU consistently improves over strong single-agent baselines by $3.4-25.5\%$ on desktop (OSWorld) and web navigation (Online-Mind2Web, WebTailBench, Odysseys) benchmarks, exhibits more favorable test-time scaling, and solves complex long-horizon tasks where single-agent CUAs get stuck. On Odysseys, a long-horizon web navigation benchmark, MACU improves average task completion wall-clock time by ${\sim} 1.5 \times$, demonstrating its efficacy in speeding up traditionally slow CUA pipelines. Our findings highlight that multi-agent coordination is a promising axis for scaling computer use agents to work productively for longer and more effectively. We release all code and interactive visualizations at https://jykoh.com/multi-agent-computer-use.

URL PDF HTML ☆

赞 0 踩 0

2606.01513 2026-06-02 cs.DC cs.AI cs.CL cs.LG

Compliance-Scored Best-of-N Guardrail Orchestration for Multimodal Document Generation in Payments Dispute Defense

基于合规评分的Best-of-N护栏编排用于支付争议防御中的多模态文档生成

Nataraj Agaram Sundar, Tejas Morabia

发表机构 * eBay Inc.（eBay公司）

AI总结提出一种结合多候选生成与合规评分早退机制的护栏编排层，通过并行生成、加权评分和最佳输出选择，在支付争议防御场景中实现高合规率与低延迟。

Comments 8 pages, 7 figures, 4 tables. Preprint. Applied systems paper on compliance-scored guardrail orchestration for multimodal LLM document generation. Contains aggregate operational readouts; not a randomized A/B test

详情

AI中文摘要

高风险企业文档生成，包括金融争议叙述、合规通知和审计摘要，要求模式正确性、策略合规性以及大规模低延迟操作。在统一的护栏层之前，生产系统通常将独立的PII编辑、内容审核和格式验证步骤拼接在一起，导致逻辑碎片化、请求路径变慢和运营成本增加。我们提出了一种针对文本和图像输入的护栏编排层，它将多候选生成与用于早退的显式合规评分相结合。该框架运行可配置的并行生成头，根据加权护栏（包括PII检测、内容审核、模式约束和领域规则）对候选进行评分，并返回具有选择元数据的最佳评分输出。可用的运营读数报告在20秒内进行5次尝试，合规率为91%。对于支付争议防御摘要，我们分析聚合运营场景读数，而非随机A/B测试。可变队列显示总体胜率高于对照组，301/659对比536/1548，对应+11.0个百分点，95%置信区间[6.6, 15.5]，p < 0.001；对于调整后的未收到物品案例，+7.5个百分点，95%置信区间[0.2, 15.7]，p = 0.045。欺诈和本地证据排名差异方向为正，但在聚合计数数据中不具有统计显著性。我们还报告了来自770次生成证据审查和70例OCR切片的评审校准的负责任AI证据质量信号，并通过请求接口、评分逻辑、伪代码和运营证据边界记录了可重复性边界。

英文摘要

High-stakes enterprise document generation, including financial dispute narratives, compliance notices, and audit summaries, demands schema correctness, policy compliance, and low-latency operation at scale. Prior to a unified guardrail layer, production systems often stitched together separate PII redaction, content moderation, and format validation steps, leading to fragmented logic, slower request paths, and higher operational cost. We present a guardrail orchestration layer for text and image inputs that couples multi-candidate generation with an explicit compliance score used for early exit. The framework runs configurable parallel generation heads, scores candidates against weighted guardrails including PII detection, content moderation, schema constraints, and domain rules, and returns the best-scoring output with selection metadata. The available operational readout reports 5 attempts within 20 seconds and 91 percent compliance. For payments dispute defense summaries, we analyze aggregate operational scenario readouts rather than a randomized A/B test. Variable cohorts show higher count win rates than controls overall, 301/659 versus 536/1548, corresponding to +11.0 percentage points with 95 percent confidence interval [6.6, 15.5] and p < 0.001, and for adjusted item-not-received cases, +7.5 percentage points with 95 percent confidence interval [0.2, 15.7] and p = 0.045. Fraud and local evidence-ranking deltas are directionally positive but not statistically significant from the aggregate count data. We also report reviewer-calibrated Responsible-AI evidence-quality signals from 770 generated-evidence reviews and a 70-case OCR slice, and document the reproducibility boundary through the request interface, scoring logic, pseudocode, and operational evidence boundary.

URL PDF HTML ☆

赞 0 踩 0

2606.01508 2026-06-02 cs.CR cs.AI

Agent Operating Systems (AOS): Integrating Agentic Control Planes into, and Beyond, Traditional Operating Systems

代理操作系统 (AOS)：将代理控制平面集成到传统操作系统及其之外

Ankur Sharma, Deep Shah

发表机构 * Independent Researcher（独立研究员）

AI总结本文提出代理操作系统（AOS）架构，通过集成代理控制平面到现有操作系统或逐步接管部分OS职责，以解决传统OS在调度、内存管理、安全、可观测性和治理方面对长期目标导向的代理AI工作负载的局限性。

详情

AI中文摘要

传统操作系统围绕确定性程序、显式控制流和人类发起的工作流设计。其核心抽象——进程、线程、系统调用、文件和权限——假设有界行为和可预测的交互模式。代理AI系统引入了一种不同的执行模型：长期存在、目标导向的实体，它们进行概率推理、动态调用工具，并根据反馈调整行为。虽然代理目前可以作为用户空间应用程序实现，但其执行特性在调度、内存和状态管理、安全性、可观测性和治理方面对操作系统边界施加了压力。本文引入了代理操作系统（AOS）的概念，这是一种将代理控制平面集成到现有操作系统中的系统架构，或者在某些模型中，随着时间的推移逐步接管选定的操作系统职责。我们提供了AOS的精确定义、明确的假设和非目标，并将AOS职责结构分解为调度器、上下文和内存管理、工具和能力注册表、策略和信任执行、以及可观测性和审计。我们分析了经典操作系统抽象对代理工作负载的局限性，提出了从用户空间运行时到分布式控制平面的集成模型，并将AOS概念映射到Linux和Windows原语。我们提出了安全性和安全性影响，包括代理特定的威胁模型，并定义了强调确定性执行、可审计性和操作员可理解性的评估标准。目标不是完全取代操作系统，而是为代理计算建立一个严格的系统基础，使其在大规模下保持可控、可问责和安全。

英文摘要

Traditional operating systems were designed around deterministic programs, explicit control flow, and human initiated workflows. Their core abstractions processes, threads, system calls, files, and permissions assume bounded behavior and predictable interaction patterns. Agentic AI systems introduce a different execution model: long-lived, goal-directed entities that reason probabilistically, invoke tools dynamically, and adapt behavior based on feedback. While agents can be implemented as user-space applications today, their execution characteristics stress OS boundaries in scheduling, memory and state management, security, observability, and governance. This paper introduces the concept of an Agent Operating System (AOS), a systems architecture that integrates an agentic control plane into existing operating systems or, in some models, subsumes selected OS responsibilities over time. We provide a precise definition of an AOS, explicit assumptions and non-goals, and a structured decomposition of AOS responsibilities into schedulers, context and memory management, tool and capability registries, policy and trust enforcement, and observability and audit. We analyze limitations of classical OS abstractions for agent workloads, propose integration models from user-space runtimes to distributed control planes, and map AOS concepts onto Linux and Windows primitives. We present security and safety implications, including agent specific threat models, and define evaluation criteria that emphasize deterministic enforcement, auditability, and operator comprehensibility. The objective is not to replace operating systems wholesale, but to establish a rigorous systems foundation for agentic computation that remains controllable, accountable, and secure at scale.

URL PDF HTML ☆

赞 0 踩 0

2606.01504 2026-06-02 cs.IR cs.LG

Semantic Retrieval for Product Search in E-Commerce

电子商务产品搜索中的语义检索

Nikhil Kothari, Saksham Samdani, Ritam Mallick, Praveen Gupta, Ankit Vijay, Surender Kumar

发表机构 * Flipkart, India（印度Flipkart）

AI总结针对电商搜索中短、嘈杂、口语化查询和细粒度属性区分问题，提出一种基于Siamese LLM双编码器的两阶段训练方法，通过对比学习和偏好优化实现精确匹配与排序。

详情

AI中文摘要

电子商务中的语义检索必须处理短、嘈杂和口语化的查询，并在具有细粒度属性区分的大型产品目录上进行。我们提出了一种Siamese LLM双编码器，通过两阶段流水线进行训练：首先使用带有假阴性边缘掩码的对比学习，以防止对近似重复产品的惩罚；然后进行相对赔率对齐检索（ROAR），这是一种偏好优化目标，通过连续赔率比边缘将Bradley-Terry扩展到可变大小的分级相关组。训练语料库反映了这一进展——第一阶段中替代查询-产品对提供粗略的语义监督，第二阶段中分级相关性注释驱动细粒度排序。由此产生的系统能够准确检索精确匹配，同时正确排序替代品和互补产品，在查询频率层和业务垂直领域均得到验证，并通过大规模在线A/B部署验证了统计显著性。

英文摘要

Semantic retrieval in e-commerce must handle short, noisy, and colloquial queries over large product catalogs with fine-grained attribute distinctions. We present a Siamese LLM dual-encoder trained through a two-stage pipeline: contrastive learning with a false-negative margin mask to prevent penalization of near-duplicate products, followed by Relative Odds Alignment for Retrieval (ROAR), a preference optimization objective that extends Bradley-Terry to variable-sized graded relevance groups via consecutive odds-ratio margins. The training corpus mirrors this progression - substitute query-product pairs provide coarse semantic supervision in Stage 1 and graded relevance annotations drive fine-grained ranking in Stage 2. The resulting system accurately retrieves exact matches while correctly ordering substitutes and complementary products, with gains confirmed across query-frequency strata and business verticals, and statistical significance validated through live A/B deployment at scale.

URL PDF HTML ☆

赞 0 踩 0

2606.01494 2026-06-02 cs.CR cs.AI cs.SE

ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree

ClawHub安全信号：当VirusTotal、静态分析和SkillSpector意见不一致时

Vincent Koc, Patrick Erichsen, Jacob Tomlinson, Agustin Rivera, Michael Appel, Nir Paz

发表机构 * OpenClaw Foundation USA（OpenClaw基金会美国）； NVIDIA United Kingdom（NVIDIA英国分公司）； NVIDIA USA（NVIDIA美国）

AI总结研究ClawHub中67,453个公开技能版本，通过VirusTotal、静态启发式分析和NVIDIA SkillSpector三种扫描器的分歧，揭示智能体技能安全需要分层治理而非单一扫描器决策。

Comments 10 pages, 1 figure, 7 tables, 1 supplimentary dataset

详情

AI中文摘要

智能体技能通过可重用的指令、工具、脚本、参考和工作流扩展AI智能体，建立了不同于模型安全和传统包恶意软件检测的安全边界。ClawHub安全信号是一个包含67,453个最新公开OpenClaw技能版本的净化数据集。每一行包含经过编辑的SKILL.md内容和净化的捆绑文件（如有），以及最终的ClawScan注册表裁决和来自三个扫描器系列（VirusTotal、静态启发式分析和NVIDIA SkillSpector）的证据。我们并非估计恶意技能的流行率，而是研究扫描器之间的分歧。三个扫描器很少标记相同的技能：任何一对扫描器在其合并阳性结果上的重叠最多为10.4%，仅0.69%的技能被所有三个扫描器标记，81.9%被标记的技能仅由单个扫描器识别。分歧由攻击面结构化。SkillSpector发出语义智能体风险警告而非恶意软件信誉信号，在25,504个可疑行中阳性19,209个（75.3%），但在206个恶意行中仅14个阳性（6.8%）。恶意裁决区域呈现相反特征：206个恶意行中150个（72.8%）为VirusTotal阳性，与捆绑代码恶意软件证据一致。这些结果表明，智能体技能安全需要分层治理，而非单一扫描器的允许/阻止决策。该语料库作为净化的银标准数据集发布：标签是注册表的自动裁决，而非人工标注的真实情况，该发布代表一个早期的、版本化的快照，旨在支持社区，同时开发人工标注的子集。鼓励进一步研究，包括针对技能安全分类的定制模型。

英文摘要

Agent skills extend AI agents with reusable instructions, tools, scripts, references, and workflows, establishing a security boundary distinct from both model safety and traditional package-malware detection. ClawHub Security Signals is a sanitized dataset of 67,453 latest public OpenClaw skill versions. Each row pairs redacted SKILL.md content and sanitized bundled files where present with a final ClawScan registry verdict and evidence from three scanner families: VirusTotal, static heuristic analysis, and NVIDIA SkillSpector. Rather than estimating malicious-skill prevalence, we study scanner disagreement. The three scanners rarely flag the same skills: any pair overlaps on at most 10.4% of their combined positives, only 0.69% of skills are flagged by all three, and 81.9% of flagged skills are identified by a single scanner. The disagreement is structured by attack surface. SkillSpector, which raises semantic agentic-risk advisories rather than malware-reputation signals, is positive for 19,209 of 25,504 suspicious rows (75.3%) but only 14 of 206 malicious rows (6.8%). The malicious-verdict region shows the inverse profile: 150 of 206 malicious rows (72.8%) are VirusTotal-positive, consistent with bundled-code malware evidence. These results show that agent-skill security requires layered governance, not single-scanner allow/block decisions. The corpus is released as a sanitized silver-standard dataset: labels are the registry's automated verdicts, not human-annotated ground truth, and the release represents an early, versioned snapshot intended to support the community while a human-annotated subset is developed. Further research is encouraged, including models tailored for skill-security triage.

URL PDF HTML ☆

赞 0 踩 0

2606.01490 2026-06-02 cs.SE cs.AI cs.MA

LLM Consortium for Software Design Refinement: A Controlled Experiment on Multi-Agent Collaboration Topologies

LLM联盟用于软件设计精化：多智能体协作拓扑的受控实验

Nagarjuna Kanamarlapudi, Praveen K

发表机构 * LLM Consortium for Software Design Refinement（软件设计精炼LLM联盟）

AI总结通过受控实验评估12种多智能体LLM协作拓扑在软件架构设计中的表现，发现结构对抗变体（v4b）和跨模型评审（v2）排名前二，并行合并因令牌饥饿和弗兰肯斯坦效应表现最差。

Comments 12 pages, 9 figures, 5 tables

详情

AI中文摘要

我们提出了一项受控实验，评估了12种用于软件架构设计的多智能体LLM协作拓扑。采用$2\times2\times2$因子设计（权威性$\times$角色$\times$动态性），我们在8个不同复杂度的设计任务上进行了520次实验运行，每个任务重复5次。设计由三个独立的自动评估器（GPT-OSS 120B、Claude Opus 4.6、Claude Sonnet 4.6）按照12维评分标准进行评估。我们报告四个核心发现。第一，结构对抗（v4b）在集成排名中位列第一——一种提示工程化的对抗变体，要求重写指令而非补丁（加权集成：4.637/5.0）。第二，跨模型评审以全票获得第二——用一个模型生成，用另一个模型评审——所有三个评估器均将其排在第二（加权集成：4.606）。第三，评估器多样性本身就是一个发现——所有三个评估器一致认为v4b最好、v3最差，但对v2b分歧严重（Claude d=1.44 vs. GPT-OSS d=0.45），揭示了不同模型家族对设计质量的权重差异。第四，并行合并从根本上被破坏——所有三个评估器都将合并变体置于底层（3.65-3.79），原因是令牌饥饿和弗兰肯斯坦效应。加权集成（$2\times$Opus + $2\times$Sonnet + $1\times$GPT-OSS）在520次运行中提供了稳健的排名，并通过独立交叉验证得到确认。

英文摘要

We present a controlled experiment evaluating 12 multi-agent LLM collaboration topologies for software architecture design. Using a $2\times2\times2$ factorial design (Authority $\times$ Roles $\times$ Dynamics), we conducted 520 experimental runs across 8 design tasks of varying complexity, with 5 repetitions each. Designs were evaluated on a 12-dimensional rubric by three independent automated evaluators (GPT-OSS 120B, Claude Opus 4.6, Claude Sonnet 4.6). We report four core findings. First, structural adversarial (v4b) ranks #1 by ensemble -- a prompt-engineered adversarial variant that demands rewrite mandates rather than patches (weighted ensemble: 4.637/5.0). Second, cross-model review wins unanimously at #2 -- generate with one model, review with another -- ranking #2 by all three evaluators (weighted ensemble: 4.606). Third, evaluator diversity is itself a finding -- all three evaluators agree v4b is best and v3 is worst, but disagree sharply on v2b (Claude d=1.44 vs. GPT-OSS d=0.45), revealing how different model families weight design qualities. Fourth, parallel merge is fundamentally broken -- all three evaluators place merge variants in the bottom tier (3.65-3.79), due to token starvation and the Frankenstein effect. The weighted ensemble ($2\times$Opus + $2\times$Sonnet + $1\times$GPT-OSS) provides robust rankings across 520 runs, confirmed through independent cross-validation.

URL PDF HTML ☆

赞 0 踩 0