arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3851
重点方向导航
2604.26176 2026-06-09 cs.DB cs.CL 版本更新

CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

CacheRAG:面向知识图谱问答中检索增强生成的语义缓存系统

Yushi Sun, Lei Chen

发表机构 * HKUST Hong Kong China(香港理工大学(中国)) HKUST(GZ) / HKUST Guangzhou / Hong Kong China (2018)(香港理工大学(广州)/ 香港理工大学(广州)/ 香港中国(2018))

AI总结 针对LLM驱动的KGQA系统作为无状态规划器导致模式幻觉和检索覆盖有限的问题,提出CacheRAG,一种基于缓存的架构,通过模式无关接口、多样性优化缓存检索和有界启发式扩展,将无状态规划器转变为持续学习器,显著提升准确性和真实性。

详情
AI中文摘要

大型语言模型(LLMs)与检索增强生成(RAG)的集成显著推进了知识图谱问答(KGQA)。然而,现有的LLM驱动的KGQA系统作为无状态规划器,孤立地生成检索计划而不利用历史查询模式:类似于一个没有计划缓存的数据库系统,从头优化每个查询。这一基本设计缺陷导致模式幻觉和有限的检索覆盖。我们提出CacheRAG,一种面向基于LLM的KGQA的系统化缓存增强架构,将无状态规划器转变为持续学习器。与传统的数据库计划缓存(优化频率)不同,CacheRAG引入了三种针对LLM上下文定制的新设计原则:(1)模式无关用户界面:通过中间语义表示(ISR)的两阶段语义解析框架使非专家用户能够纯粹以自然语言交互,同时后端适配器将LLM与本地模式上下文结合,安全地编译可执行的物理查询。(2)多样性优化的缓存检索:两层层次索引(领域→方面)结合最大边际相关性(MMR)最大化缓存示例的结构多样性,有效缓解推理同质性。(3)有界启发式扩展:具有严格复杂度保证的确定性深度和广度子图操作符显著提升检索召回率,而无需冒无界API执行的风险。在多个基准上的广泛实验表明,CacheRAG显著优于最先进的基线(例如,在CRAG数据集上准确率提升13.2%,真实性提升17.5%)。

英文摘要

The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA systems act as stateless planners, generating retrieval plans in isolation without exploiting historical query patterns: analogous to a database system that optimizes every query from scratch without a plan cache. This fundamental design flaw leads to schema hallucinations and limited retrieval coverage. We propose CacheRAG, a systematic cache-augmented architecture for LLM-based KGQA that transforms stateless planners into continual learners. Unlike traditional database plan caching (which optimizes for frequency), CacheRAG introduces three novel design principles tailored for LLM contexts: (1) Schema-agnostic user interface: A two-stage semantic parsing framework via Intermediate Semantic Representation (ISR) enables non-expert users to interact purely in natural language, while a Backend Adapter grounds the LLM with local schema context to compile executable physical queries safely. (2) Diversity-optimized cache retrieval: A two-layer hierarchical index (Domain $\rightarrow$ Aspect) coupled with Maximal Marginal Relevance (MMR) maximizes structural variety in cached examples, effectively mitigating reasoning homogeneity. (3) Bounded heuristic expansion: Deterministic depth and breadth subgraph operators with strict complexity guarantees significantly enhance retrieval recall without risking unbounded API execution. Extensive experiments on multiple benchmarks demonstrate that CacheRAG significantly outperforms state-of-the-art baselines (e.g., +13.2% accuracy and +17.5% truthfulness on the CRAG dataset).

2604.25965 2026-06-09 stat.ML cs.LG 版本更新

Adversarial Robustness of NTK Neural Networks

NTK神经网络的对抗鲁棒性

Yuxuan Hou

发表机构 * Qiuzhen College, Tsinghua University(清华大学求真学院) Yau Mathematical Sciences Center, Tsinghua University(清华大学auer数学科学中心)

AI总结 本文研究了NTK神经网络在非参数回归中的对抗鲁棒性,推导了Sobolev空间中的对抗回归最小最大最优速率,并证明了通过梯度流早停训练的NTK网络可达到该最优速率,但在过拟合情况下最小范数插值器易受对抗扰动影响。

详情
AI中文摘要

深度学习模型被广泛应用于安全关键领域,但仍然容易受到对抗攻击。本文研究了NTK神经网络在非参数回归中的对抗鲁棒性。我们建立了Sobolev空间中的对抗回归最小最大最优速率,并证明了通过梯度流早停训练的NTK神经网络可以达到该最优速率。然而,在过拟合情况下,我们证明了最小范数插值器对对抗扰动是脆弱的。

英文摘要

Deep learning models are widely deployed in safety-critical domains, but remain vulnerable to adversarial attacks. In this paper, we study the adversarial robustness of NTK neural networks in the context of nonparametric regression. We establish minimax optimal rates for adversarial regression in Sobolev spaces and then show that NTK neural networks, trained via gradient flow with early stopping, can achieve this optimal rate. However, in the overfitting regime, we prove that the minimum norm interpolant is vulnerable to adversarial perturbations.

2509.22097 2026-06-09 cs.SE cs.AI cs.CL cs.CR 版本更新

SecureVibeBench: Benchmarking Secure Vibe Coding of AI Agents via Reconstructing Vulnerability-Introducing Scenarios

SecureVibeBench: 通过重建引入漏洞的场景来基准测试AI代理的安全振动编码

Junkai Chen, Huihui Huang, Yunbo Lyu, Junwen An, Jieke Shi, Chengran Yang, Ting Zhang, Haoye Tian, Yikun Li, Zhenhao Li, Xin Zhou, Xing Hu, David Lo

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出SecureVibeBench,一个包含105个C/C++安全编码任务的基准测试,旨在评估AI代理在真实场景中生成安全代码的能力,发现现有方法在评估人类与AI代理对比时的不足。

Comments ACL 2026 Main Conference. Our code and data are on https://github.com/iCSawyer/SecureVibeBench

详情
AI中文摘要

大型语言模型驱动的代码代理正在迅速改变软件工程,但其生成代码的安全风险已成为关键问题。现有基准测试提供了有价值的见解,但未能捕捉到由人类开发者实际引入漏洞的场景,使得人类与代理之间的公平比较不可行。因此,我们引入SecureVibeBench,一个包含来自OSS-Fuzz的41个项目中105个C/C++安全编码任务的基准测试,用于代码代理。SecureVibeBench具有以下特点:(i)现实的任务设置,要求在大型仓库中进行多文件编辑;(ii)基于真实世界开源漏洞对齐的上下文,具有精确标识的漏洞引入点;(iii)全面的评估,结合功能测试和安全检查,使用静态和动态或acles。我们评估了5种流行的代码代理,如OpenHands,支持5种LLM(如Claude Sonnet 4.5)在SecureVibeBench上。结果表明,当前代理在生成既正确又安全的代码方面存在困难,即使表现最好的代理,在SecureVibeBench上也只能产生23.8%的正确且安全的解决方案。我们的代码和数据在https://github.com/iCSawyer/SecureVibeBench上。

英文摘要

Large language model-powered code agents are rapidly transforming software engineering, yet the security risks of their generated code have become a critical concern. Existing benchmarks have provided valuable insights, but they fail to capture scenarios in which vulnerabilities are actually introduced by human developers, making fair comparisons between humans and agents infeasible. We therefore introduce SecureVibeBench, a benchmark of 105 C/C++ secure coding tasks sourced from 41 projects in OSS-Fuzz for code agents. SecureVibeBench has the following features: (i) realistic task settings that require multi-file edits in large repositories, (ii)~aligned contexts based on real-world open-source vulnerabilities with precisely identified vulnerability introduction points, and (iii) comprehensive evaluation that combines functionality testing and security checking with both static and dynamic oracles. We evaluate 5 popular code agents like OpenHands, supported by 5 LLMs (e.g., Claude sonnet 4.5) on SecureVibeBench. Results show that current agents struggle to produce both correct and secure code, as even the best-performing one, produces merely 23.8\% correct and secure solutions on SecureVibeBench. Our code and data are on https://github.com/iCSawyer/SecureVibeBench.

2604.20897 2026-06-09 cs.IT cs.AI math.IT physics.comp-ph 版本更新

Watts-per-Intelligence Part II: Algorithmic Catalysis

每智能瓦特 Part II:算法催化

Elija Perrier

发表机构 * Centre for Quantum Software and Information(量子软件与信息中心)

AI总结 本文基于每智能瓦特框架发展算法催化热力学理论,提出可重用的计算结构以减少任务类的不可逆操作,同时满足受限恢复和结构选择性约束。证明任务类特定速度提升上限由算法互信息决定,并通过兰道尔擦除最小热力学成本。结合结果得出耦合定理,下界限定算法催化部署时间范围。

Comments Camera ready version, AGI-2026

详情
AI中文摘要

我们发展了基于每智能瓦特框架内的算法催化热力学理论,识别出可重用的计算结构,以减少任务类的不可逆操作,同时满足有限恢复和结构选择性约束。我们证明任何特定任务类的速度提升上限由子基质与类描述符之间的算法互信息决定,并且编码此信息会通过兰道尔擦除产生最小热力学成本。结合这些结果得出一个耦合定理,该定理下界限定算法催化部署时间范围所需的部署时间。该框架在仿射SAT类上进行了示例说明,并将当代学习系统置于智能计算的信息热力学约束之下。

英文摘要

We develop a thermodynamic theory of algorithmic catalysis within the watts per intelligence framework, identifying reusable computational structures that reduce irreversible operations for a task class while satisfying bounded restoration and structural selectivity constraints. We prove that any class specific speed-up is upper-bounded by the algorithmic mutual information between the substrate and the class descriptor, and that encoding this information incurs a minimum thermodynamic cost via Landauer erasure. Combining these results yields a coupling theorem that lower-bounds the deployment horizon required for an algorithmic catalyst to be energetically favourable. The framework is illustrated on an affine SAT class and situates contemporary learned systems within an information thermodynamic constraint on intelligent computation.

2512.03465 2026-06-09 cs.CR cs.CL cs.IR 版本更新

Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

痕迹掩盖:技术、趋势与可触摸特性测试

Robert Dilworth

发表机构 * Department of Computer Science and Engineering, Mississippi State University(密苏里州立大学计算机科学与工程系)

AI总结 本文通过严格评估TraceTarnish攻击脚本,利用对抗性风格学原理匿名化文本消息的作者身份,通过Reddit评论数据和StyloMetrix增强,提取出显著的信息增益特征,用于检测文本篡改。

Comments 20 pages, 8 figures, 2 tables

详情
AI中文摘要

在本研究中,我们更严格地评估了我们的攻击脚本TraceTarnish,该脚本利用对抗性风格学原理来匿名化文本消息的作者身份。为了确保攻击的有效性和实用性,我们收集、处理并分析了Reddit评论——这些评论后来被转化为TraceTarnish数据,以获得有价值的见解。转换后的TraceTarnish数据随后通过StyloMetrix进一步增强,生成风格学特征——这些特征通过信息增益标准筛选,仅保留最具信息量、预测性和判别性的特征。我们的结果发现,功能词和功能词类型(L_FUNC_A & L_FUNC_T);内容词和内容词类型(L_CONT_A & L_CONT_T);以及类型-词频比(ST_TYPE_TOKEN_RATIO_LEMMAS)产生了显著的信息增益读数。识别出的风格学线索——功能词频率、内容词分布和类型-词频比——作为可靠的入侵指标(IoCs),揭示了文本被人为篡改以掩盖真实作者的时间。同样,这些特征可以作为法医信号,提醒防御者存在对抗性风格学攻击;尽管在没有原始信息的情况下,这种信号可能被忽略,因为它似乎依赖于前后转换的比较。'在试图抹去痕迹时,你往往留下更大的痕迹。'基于这一理解,我们围绕这五个孤立特征框架了TraceTarnish的操作和输出,利用它们来概念化和实现增强,进一步加强攻击。

英文摘要

In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments -- comments that were later alchemized into $\textit{TraceTarnish}$ data -- to gain valuable insights. The transformed $\textit{TraceTarnish}$ data was then further augmented by $\textit{StyloMetrix}$ to manufacture stylometric features -- features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types ($L\_FUNC\_A$ $\&$ $L\_FUNC\_T$); content words and content word types ($L\_CONT\_A$ $\&$ $L\_CONT\_T$); and the Type-Token Ratio ($ST\_TYPE\_TOKEN\_RATIO\_LEMMAS$) yielded significant Information-Gain readings. The identified stylometric cues -- function-word frequencies, content-word distributions, and the Type-Token Ratio -- serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed $\textit{TraceTarnish}$'s operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.

2604.17249 2026-06-09 cs.CR cs.AR cs.LG 版本更新

Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems

LLM服务系统中共享KV缓存块的位翻转漏洞

Yuji Yamamoto, Satoshi Matsuura

发表机构 * Institute of Science Tokyo(东京科学研究所)

AI总结 研究揭示LLM服务系统中共享KV缓存块的位翻转漏洞,指出其具有静默分歧、选择性传播和持久累积特性,提出基于校验和的防护措施以限制累积损害。

Comments 12 pages, 4 figures. Accepted at SECRYPT 2026 (23rd International Conference on Security and Cryptography). Conference: https://secrypt.scitevents.org/

详情
AI中文摘要

在GPU DRAM上进行Rowhammer攻击可以导致模型权重中的对抗性位翻转;LLM服务系统中的共享KV缓存块呈现出类似但此前未被研究的目标。在vLLM的前缀缓存中,这些块以单一物理副本存在且无完整性保护。通过软件故障注入在理想位目标下,我们表征了最坏情况的严重性,并识别出三个特性:(1)静默分歧——16个BF16位位置中有13个产生一致但修改后的输出,无法与合法响应区分;(2)选择性传播——只有共享目标前缀的请求受影响;(3)持久累积——没有时间衰减,因此累积损害随后续请求线性增长。这些特性构成了不同于权重篡改的独特威胁:静默分歧和选择性传播使检测逃避成为可能;持久累积则继续 unchecked,损害放大仅受缓存块保持缓存时间的限制。基于校验和的防护措施在调度时检测任何单比特损坏,将累积损害限制为一个批次,无论块的缓存时间如何,且具有可忽略的开销。这些结果呼吁在端到端利用之前对前缀块进行完整性保护。

英文摘要

Rowhammer on GPU DRAM has enabled adversarial bit flips in model weights; shared KV-cache blocks in LLM serving systems present an analogous but previously unexamined target. In vLLM's Prefix Caching, these blocks exist as a single physical copy without integrity protection. Using software fault injection under ideal bit targeting, we characterize worst-case severity and identify three properties: (1) Silent divergence - 13 of 16 BF16 bit positions produce coherent but altered outputs, indistinguishable from legitimate responses without a clean baseline. (2) Selective propagation - only requests sharing the targeted prefix are affected. (3) Persistent accumulation - no temporal decay occurs, so cumulative damage grows linearly with subsequent requests. Together, these constitute a threat profile distinct from weight corruption: silent divergence and selective propagation enable detection evasion; persistent accumulation then proceeds unchecked, yielding damage amplification bounded only by how long the block remains cached. A checksum-based countermeasure detects any single-bit corruption at scheduling time, bounding cumulative damage to one batch independent of the block's cache lifetime, with negligible overhead. These results argue for integrity protection of prefix blocks before end-to-end exploitation is demonstrated.

2604.10842 2026-06-09 cs.SE cs.AI 版本更新

Resilient Write: A Six-Layer Durable Write Surface for LLM Coding Agents

抗挫写入:一种六层耐用写入表面用于大语言模型编码代理

Justice Owusu Agyemang, Jerry John Kponyo, Elliot Amponsah, Godfred Manu Addo Boakye, Kwame Opuni-Boachie Obour Agyekum

发表机构 * Sperixlabs, Ghana(塞普里克斯实验室,加纳) Kwame Nkrumah University of Science and Technology, Kumasi, Ghana(库马西技术大学,加纳) VIA Cybersecurity Lab, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana(VIA网络安全实验室,库马西技术大学,加纳)

AI总结 本文提出Resilient Write,通过六层耐用写入表面提升编码代理在文件写入时的容错能力,减少恢复时间并提高自我纠正率。

详情
AI中文摘要

LLM驱动的编码代理越来越多地依赖如模型上下文协议(MCP)等工具使用协议来读写开发者的工作站文件。当写入失败——由于内容过滤、截断或会话中断——代理通常得不到结构化的信号,丢失草稿并浪费令牌盲目重试。我们提出了Resilient Write,一种MCP服务器,它在代理和文件系统之间插入一个六层耐用写入表面。这些层——预飞行风险评分、事务性原子写入、可恢复分块、结构化类型错误、带外暂存存储以及任务连续性交接信封——是正交且独立可采用的。每个层映射到在2026年4月真实代理会话中观察到的具体故障模式,在该会话中内容安全过滤器静默拒绝了一个包含擦除的API密钥前缀的草稿。三个额外工具——分块预览、格式感知验证和日志分析——从使用该系统撰写本文时产生。一个186测试套件在每层验证正确性,定量比较显示相对于简单和防御性基线,恢复时间减少了5倍,代理自我纠正率提高了13倍。Resilient Write在MIT许可下开源。

英文摘要

LLM-powered coding agents increasingly rely on tool-use protocols such as the Model Context Protocol (MCP) to read and write files on a developer's workstation. When a write fails - due to content filters, truncation, or an interrupted session - the agent typically receives no structured signal, loses the draft, and wastes tokens retrying blindly. We present Resilient Write, an MCP server that interposes a six-layer durable write surface between the agent and the filesystem. The layers - pre-flight risk scoring, transactional atomic writes, resume-safe chunking, structured typed errors, out-of-band scratchpad storage, and task-continuity handoff envelopes - are orthogonal and independently adoptable. Each layer maps to a concrete failure mode observed during a real agent session in April 2026, in which content-safety filters silently rejected a draft containing redacted API-key prefixes. Three additional tools - chunk preview, format-aware validation, and journal analytics - emerged from using the system to compose this paper. A 186-test suite validates correctness at each layer, and quantitative comparison against naive and defensive baselines shows a 5x reduction in recovery time and a 13x improvement in agent self-correction rate. Resilient Write is open-source under the MIT license.

2604.09787 2026-06-09 astro-ph.IM astro-ph.GA cs.LG 版本更新

Learning What's Real: Disentangling Signal and Measurement Artifacts in Multi-Sensor Data, with Applications to Astrophysics

学习真实内容:在多传感器数据中分离信号和测量伪影,应用于天体物理学

Pablo Mercader-Perez, Carolina Cuesta-Lazaro, Daniel Muthukrishna, Jeroen Audenaert, V. Ashley Villar, David W. Hogg, Marc Huertas-Company, William T. Freeman

发表机构 * Massachusetts Institute of Technology(麻省理工学院) Flatiron Institute, Simons Foundation(Flatiron研究所,Simons基金会) Institute for Advanced Studies(高级研究 institute) Harvard University(哈佛大学) New York University(纽约大学) Instituto de Astrofísica de Canarias(加那利大天文台)

AI总结 本文提出一种深度学习框架,通过重叠观测、双编码器架构和反事实生成目标,分离多传感器数据中的信号与伪影,提升天体物理学研究的准确性。

Comments Accepted at the 2nd Workshop on Foundation Models for Science at ICLR 2026. 10 pages, 7 figures (main text), plus appendix

详情
AI中文摘要

从物理世界收集的数据总是由多个来源组成:感兴趣的物理过程的底层信号和由传感器或仪器引起的测量依赖伪影信号。这种二次信号作为混淆因素,限制了我们提取观测现象底层物理信息的能力。此外,它还复杂了异构或多仪器设置中的观测组合。我们提出了一种深度学习框架,利用重叠观测、双编码器架构和反事实生成目标来分离这些变化因素。所得的表示方法明确地将内在信号与传感器特定的失真和噪声分开,并可用于反事实视图生成、不受测量失真影响的参数推断以及仪器无关的相似性搜索。我们在德克萨斯大学天文台(DESI Legacy Imaging Survey)和超大规模望远镜(HSC Survey)的天体物理星系图像上展示了该方法的有效性,作为代表性的多仪器设置。该框架提供了一种通用的科学和多模态自监督预训练方法:从相同物理系统的重叠观测中构建训练对,将传感器或模态特定的影响视为增强,通过反事实生成学习不变的表示。

英文摘要

Data collected from the physical world is always a combination of multiple sources: an underlying signal from the physical process of interest and a signal from measurement-dependent artifacts from the sensor or instrument. This secondary signal acts as a confounding factor, limiting our ability to extract information about the physics underlying the phenomena we observe. Furthermore, it complicates the combination of observations in heterogeneous or multi-instrument settings. We propose a deep learning framework that leverages overlapping observations, a dual-encoder architecture, and a counterfactual generation objective to disentangle these factors of variation. The resulting representations explicitly separate intrinsic signals from sensor-specific distortions and noise, and can be used for counterfactual view generation, parameter inference unconfounded by measurement distortions, and instrument-independent similarity search. We demonstrate the effectiveness of our approach on astrophysical galaxy images from the DESI Legacy Imaging Survey (Legacy) and the Hyper Suprime-Cam (HSC) Survey as a representative multi-instrument setting. This framework provides a general recipe for scientific and multi-modal self-supervised pretraining: construct training pairs from overlapping observations of the same physical system, treat sensor- or modality-specific effects as augmentations, and learn invariant representations through counterfactual generation.

2604.08304 2026-06-09 cs.CR cs.AI 版本更新

Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

保障检索增强生成:攻击、防御与未来方向的分类法

Yuming Xu, Mingtao Zhang, Zhuohan Ge, Haoyang Li, Nicole Hu, Yongqi Zhang, Zhiyuan Wen, Jason Chen Zhang, Qing Li, Lei Chen

发表机构 * The Hong Kong Polytechnic University(香港理工大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州))

AI总结 本文提出SLOT分类法,从攻击面、防御层、目标(遵循CIA属性)和攻击目标四个维度系统化梳理检索增强生成(RAG)的安全风险与防御,并指出知识访问管道中的结构性错配,最后展望未来方向。

Comments We have curated a paper list on RAG security in https://github.com/TreeAI-Lab/Awesome-RAG-Security, and we warmly welcome authors who wish to have their new work included to contact us via email

详情
AI中文摘要

检索增强生成(RAG)通过外部知识扩展大型语言模型(LLM),但这一访问路径也引入了安全风险,现有工作常将其与LLM固有缺陷混为一谈。我们将安全RAG定义为保障外部知识访问,并使用SLOT分类法组织文献,该分类法沿四个轴:攻击面(S,对手作用的位置)、防御层(L,控制同一点)、目标(O,遵循CIA属性被破坏的目标)以及追求的目标(T,从单个已知查询(T1)到跨查询分布的目标声明操纵(T2))。将攻击、防御、补救和评估映射到六阶段知识访问管道,我们揭示了两个结构性错配。最后,我们讨论了更现实目标、无盲点和自适应评估的防御、更强的机密性以及多模态和智能体RAG评估的方向。

英文摘要

Retrieval-augmented generation (RAG) extends large language models (LLMs) with external knowledge, but this access path also introduces security risks that existing work often conflates with inherent LLM flaws. We frame secure RAG as securing external knowledge access and organize the literature with SLOT, a taxonomy along four axes: the attack Surface (S) where an adversary acts, the defense Layer (L) that controls the same point, the Objective (O) it breaks following the CIA properties, and the Target (T) it pursues, from a single known query (T1) to target-claim manipulation across a query distribution (T2). Mapping attacks, defenses, remediation, and evaluation onto a six-stage knowledge-access pipeline, we expose two structural mismatches. Finally, we discuss directions for more realistic targets, no-blind-spot and adaptively evaluated defenses, stronger confidentiality, and evaluation for multimodal and agentic RAG. The curated paper list for RAG security is in: https://github.com/TreeAI-Lab/Awesome-RAG-Security.

2604.07125 2026-06-09 cs.CR cs.LG 版本更新

Scalable and Private Federated Learning Using Distributed Differential Privacy and Secure Aggregation

可扩展且隐私保护的联邦学习:利用分布式差分隐私和安全聚合

Wenjing Wei, Farid Nait-Abdesselam, Alla Jammine

发表机构 * Université Paris Cité(巴黎Cité大学)

AI总结 本文提出DDP-SA框架,结合客户端侧本地差分隐私和全阈值加法秘密共享,实现安全聚合,提供更强的端到端隐私保障且计算可行。

Comments Submitted to IEEE Transactions on Dependable and Secure Computing (under review)

详情
AI中文摘要

本文提出了DDP-SA,一种可扩展的隐私保护联邦学习框架,联合利用客户端侧本地差分隐私(LDP)和全阈值加法秘密共享(ASS)进行安全聚合。与仅依赖差分隐私或安全多方计算(MPC)的方法不同,DDP-SA整合两种技术,提供更强的端到端隐私保障,同时保持计算可行性。该框架引入了双阶段保护机制:客户端首先用校准的拉普拉斯噪声扰动本地梯度,然后将噪声梯度分解为加法秘密份额,分发到多个中间服务器。此设计确保(i)没有单个被入侵的服务器或通信通道能揭示任何关于个体客户端更新的信息,且(ii)参数服务器仅重建聚合的噪声梯度,从不任何客户端特定的贡献。大量实验表明,DDP-SA在模型准确性上显著高于独立LDP,同时提供比MPC-only方法更强的隐私保护。所提框架的扩展性与参与者的数量线性相关,并提供了一个实用的、隐私保护的联邦学习解决方案,具有可控的计算和通信开销。

英文摘要

This article presents DDP-SA, a scalable privacy-preserving federated learning framework that jointly leverages client-side local differential privacy (LDP) and full-threshold additive secret sharing (ASS) for secure aggregation. Unlike existing methods that rely solely on differential privacy or on secure multi-party computation (MPC), DDP-SA integrates both techniques to deliver stronger end-to-end privacy guarantees while remaining computationally practical. The framework introduces a two-stage protection mechanism: clients first perturb their local gradients with calibrated Laplace noise, then decompose the noisy gradients into additive secret shares that are distributed across multiple intermediate servers. This design ensures that (i) no single compromised server or communication channel can reveal any information about individual client updates, and (ii) the parameter server reconstructs only the aggregated noisy gradient, never any client-specific contribution. Extensive experiments show that DDP-SA achieves substantially higher model accuracy than standalone LDP while providing stronger privacy protection than MPC-only approaches. The proposed framework scales linearly with the number of participants and offers a practical, privacy-preserving solution for federated learning applications with controllable computational and communication overhead.

2603.29875 2026-06-09 cs.IR cs.AI cs.CL 版本更新

UnWeaving the knots of GraphRAG -- turns out VectorRAG is almost enough

解开图式RAG的结——事实证明向量RAG几乎足够

Ryszard Tuora, Mateusz Galiński, Michał Godziszewski, Michał Karpowicz, Mateusz Czyżnikiewicz, Adam Kozakiewicz, Tomasz Ziętkiewicz

发表机构 * Samsung AI Warsaw(三星AI华沙)

AI总结 本文提出UnWeaver框架,通过LLM解构文档内容为跨chunk的实体,提升检索和生成的准确性与效率,实验表明向量RAG在成本上优于图式RAG。

详情
AI中文摘要

检索增强生成(RAG)系统中的关键问题在于基于片段的检索流程将源片段视为原子对象,将其中信息混合成单一向量。这些向量被视为孤立、独立且自足,没有尝试表示它们之间的可能关系。此类方法缺乏处理多跳问题的专用机制。图式RAG系统通过将信息建模为知识图谱来缓解这一问题,实体由节点表示,通过稳健的关系连接并形成层次化社区。然而,这种方法自身也存在一些问题,包括为创建图式索引而增加数量级的组件复杂性,以及依赖启发式方法进行检索。我们提出UnWeaver,一种新颖的RAG框架,简化了图式RAG的理念。UnWeaver利用LLM将文档内容解构为可以在多个片段中出现的实体。在检索过程中,实体被用作恢复原始文本片段的中间方式,从而保持对源材料的忠实度。我们主张基于实体的分解能提供更浓缩的原始信息表示,同时还能减少索引和生成过程中的噪声。此外,我们实验表明,在端到端QA评估中,向量RAG的表现优于标准图式RAG,并且几乎与当前最先进的图式解决方案相当,但成本仅为其分数。

英文摘要

One of the key problems in Retrieval-augmented generation (RAG) systems is that chunk-based retrieval pipelines represent the source chunks as atomic objects, mixing the information contained within such a chunk into a single vector. These vector representations are then fundamentally treated as isolated, independent and self-sufficient, with no attempt to represent possible relations between them. Such an approach has no dedicated mechanisms for handling multi-hop questions. Graph-based RAG systems aimed to ameliorate this problem by modeling information as knowledge-graphs, with entities represented by nodes being connected by robust relations, and forming hierarchical communities. This approach however suffers from its own issues with some of them being: orders of magnitude increased componential complexity in order to create graph-based indices, and reliance on heuristics for performing retrieval. We propose UnWeaver, a novel RAG framework simplifying the idea of GraphRAG. UnWeaver disentangles the contents of the documents into entities which can occur across multiple chunks using an LLM. In the retrieval process entities are used as an intermediate way of recovering original text chunks hence preserving fidelity to the source material. We argue that entity-based decomposition yields a more distilled representation of original information, and additionally serves to reduce noise in the indexing, and generation process. Furthermore we experimentally show that on end to end QA evaluation VectorRAG performs better than standard GraphRAG and almost as good as current SOTA graph-based solutions, for a fraction of the cost.

2602.06934 2026-06-09 cs.PL cs.AI cs.DC cs.LO cs.MA 版本更新

Implementing Grassroots Logic Programs with Multiagent Transition Systems and AI (Full Version)

基于多智能体转换系统和人工智能实现基础逻辑程序

Ehud Shapiro

发表机构 * London School of Economics(伦敦经济学院) Weizmann Institute of Science(魏茨曼科学研究院)

AI总结 本文提出dGLP和madGLP两种确定性变体,通过全局链接实现共享变量,证明其正确性,并展示如何利用AI技术实现多智能体通信。

详情
AI中文摘要

Grassroots Logic Programs (GLP) 是一种并发逻辑编程语言,其中逻辑变量被划分为配对的读者和写者。一个赋值最多通过写者一次,其配对的读者最多一次消耗,可能包含额外的读者和/或写者。这使得丰富多向通信模态的简洁表达成为可能。该语言与并发(cGLP)和多智能体(maGLP)操作语义一起引入。本文从这些(ia)dGLP,cGLP的确定性对应物,和(ib)madGLP,一种多智能体对应物,其中确定性智能体仅通过异步消息传递通信,并证明它们的抽象对应物的正确性。maGLP跨越智能体的共享变量对可以作为本地变量通过全局链接配对,其正确性源于不重叠的替换交换性(GLP的单次出现不变量的结果)。我们进一步证明madGLP是基础的。dGLP和madGLP作为AI驱动的实现学科(数学→非正式规范→Dart)的形式规范被使用和描述:从dGLP,AI(Claude)开发了一个基于工作站的GLP实现,从madGLP正在开发一个基于智能手机的多智能体实现。

英文摘要

Grassroots Logic Programs (GLP) is a concurrent logic programming language in which logic variables are partitioned into paired readers and writers. An assignment is produced at most once via a writer and consumed at most once via its paired reader, and may contain additional readers and/or writers. This enables the concise expression of rich multidirectional communication modalities. The language was introduced together with concurrent (cGLP) and multiagent (maGLP) operational semantics. Here, we derive from these (\ia)~dGLP, a deterministic counterpart of cGLP, and (\ib)~madGLP, a counterpart of maGLP in which deterministic agents communicate solely by asynchronous message passing, and prove them correct against their abstract counterparts. maGLP shared variable pairs spanning agents can be implemented as local variables paired by \emph{global links}, with correctness following from disjoint substitution commutativity (a consequence of GLP's single-occurrence invariant). We further prove that madGLP is grassroots. Both dGLP and madGLP serve as formal specifications for an AI-driven implementation discipline (math $\to$ informal spec $\to$ Dart) employed and described here: from dGLP, AI (Claude) developed a workstation-based GLP implementation in Dart, and from madGLP it is developing a smartphone-based multiagent one.

2509.11485 2026-06-09 cond-mat.mtrl-sci cs.CV 版本更新

Geometric Analysis of Magnetic Labyrinthine Stripe Evolution via Deep Learning Segmentation

通过深度学习分割进行磁性迷宫条纹演化的几何分析

Vinícius Yu Okubo, Kotaro Shimizu, B. S. Shivaram, Gia-Wei Chern, Hae Yong Kim

发表机构 * Dept. Electronic Systems Engineering, Polytechnic School, University of São Paulo(圣保罗大学电子系统工程系) Department of Applied Physics, The University of Tokyo(东京大学应用物理系) Department of Physics, University of Virginia(弗吉尼亚大学物理系)

AI总结 研究通过深度学习分割和几何分析,量化磁性条纹的局部结构演化,揭示两种演化模式与磁场极性的关系。

Comments 17 pages, 15 figures. This manuscript will be submitted to the Journal of Magnetism and Magnetic Materials and is not yet under review

详情
AI中文摘要

迷宫状条纹图案在许多物理系统中普遍存在,但缺乏长程秩序使得定量表征具有挑战性。我们研究了在磁场退火协议下掺铋钇铁 garnet (Bi:YIG) 薄膜中此类图案的演化。通过训练包含加性白高斯噪声和简单形噪声的 U-Net 深度学习模型,能够稳健地分割实验磁光图像,即使存在噪声和遮挡。基于此分割,我们开发了基于骨架化、图映射和样条拟合的几何分析流程,通过长度和曲率测量量化局部条纹传播。对 444 张图像进行分析,揭示了从“淬火”状态到更平行和一致的“退火”状态的转变,并识别出两种不同的演化模式(类型 A 和 B),与磁场极性相关。我们的结果提供了对磁性条纹图案几何和拓扑性质的定量分析,并为复杂迷宫系统分析提供了新的见解和通用工具。

英文摘要

Labyrinthine stripe patterns are common in many physical systems, yet their lack of long-range order makes quantitative characterization challenging. We investigate the evolution of such patterns in bismuth-doped yttrium iron garnet (Bi:YIG) films subjected to a magnetic field annealing protocol. A U-Net deep learning model, trained with synthetic degradations including additive white Gaussian and Simplex noise, enables robust segmentation of experimental magneto-optical images despite noise and occlusions. Building on this segmentation, we develop a geometric analysis pipeline based on skeletonization, graph mapping, and spline fitting, which quantifies local stripe propagation through length and curvature measurements. Applying this framework to 444 images from 12 annealing protocol trials, we analyze the transition from the "quenched" state to a more parallel and coherent "annealed" state, and identify two distinct evolution modes (Type A and Type B) linked to field polarity. Our results provide a quantitative analysis of geometric and topological properties in magnetic stripe patterns and offer new insights into their local structural evolution, and establish a general tool for analyzing complex labyrinthine systems.

2604.01039 2026-06-09 cs.CR cs.AI 版本更新

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

用于评估和加固LLM系统指令对抗编码攻击的自动化框架

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

发表机构 * Keysight Technologies

AI总结 本文提出自动化框架评估LLM系统指令在对抗编码攻击时的保密性,通过四个模型和46条指令测试发现结构化序列化攻击成功率高,提出基于Chain-of-Thought的缓解策略。

详情
AI中文摘要

大型语言模型(LLM)中的系统指令常用于执行安全策略、定义代理行为并保护敏感操作上下文。这些指令可能包含敏感信息如API凭证、内部政策和特权工作流定义,使系统指令泄露成为LLM应用中的关键安全风险。无需推理模型的开销,许多LLM应用依赖拒绝型指令来阻止直接请求系统指令,隐含假设被禁止的信息只能通过显式查询提取。我们引入了一个自动化评估框架,测试在将提取请求重新框架化为编码或结构化输出任务时系统指令是否保持保密。在四个常见模型和46条验证过的系统指令上,我们发现结构化序列化攻击的成功率(>0.7)。我们进一步展示了一种基于一次指令重塑的缓解策略,使用Chain-of-Thought推理模型,表明即使系统指令的措辞和结构有细微变化,也能显著降低攻击成功率,而无需重新训练模型。

英文摘要

System Instructions in Large Language Models (LLMs) are commonly used to enforce safety policies, define agent behavior, and protect sensitive operational context in agentic AI applications. These instructions may contain sensitive information such as API credentials, internal policies, and privileged workflow definitions, making system instruction leakage a critical security risk highlighted in the OWASP Top 10 for LLM Applications. Without incurring the overhead costs of reasoning models, many LLM applications rely on refusal-based instructions that block direct requests for system instructions, implicitly assuming that prohibited information can only be extracted through explicit queries. We introduce an automated evaluation framework that tests whether system instructions remain confidential when extraction requests are re-framed as encoding or structured output tasks. Across four common models and 46 verified system instructions, we observe high attack success rates ( > 0.7) for structured serialization where models refuse direct extraction requests but disclose protected content in the requested serialization formats. We further demonstrate a mitigation strategy based on one-shot instruction reshaping using a Chain-of-Thought reasoning model, indicating that even subtle changes in wording and structure of system instructions can significantly reduce attack success rate without requiring model retraining.

2510.26307 2026-06-09 cs.CR cs.LG 版本更新

A Survey of Heterogeneous Graph Neural Networks for Cybersecurity Anomaly Detection

异构图神经网络在网络安全异常检测中的综述

Laura Jiang, Reza Ryan, Qian Li, Nasim Ferdosian

发表机构 * GitHub

AI总结 本文综述了异构图神经网络在网络安全异常检测中的应用,分析了不同类型异常和图动态的分类方法,评估了常用数据集和指标,并指出了建模、数据和部署中的关键挑战。

Comments 23 pages, 7 figures, and 97 references. Accepted by the Journal of Computer Security

详情
AI中文摘要

异常检测是网络安全中的关键任务,识别内部威胁、访问违规和协同攻击对确保系统韧性至关重要。基于图的方法在建模实体交互中变得越来越重要,但大多数方法依赖于同质和静态结构,限制了其捕捉现实环境异质性和时间演变的能力。异构图神经网络(HGNN)通过引入类型感知转换和关系敏感聚合,成为异常检测的有前途的范式,能够更有效地建模复杂的网络数据。然而,目前关于基于HGNN的异常检测的研究仍零散,存在多样化的建模策略、有限的比较评估和缺乏标准化基准。为解决这一差距,本文提供了网络安全中基于HGNN的异常检测方法的全面综述。我们介绍了一种分类法,按异常类型和图动态对方法进行分类,分析了代表性模型,并将其映射到关键网络安全应用。我们还回顾了常用基准数据集和评估指标,突显其优势和局限性。最后,我们指出了与建模、数据和部署相关的关键开放挑战,并概述了未来研究的有前途方向。本文综述旨在建立一个结构化的基础,推动基于HGNN的异常检测向可扩展、可解释和可实际部署的解决方案发展。

英文摘要

Anomaly detection is a critical task in cybersecurity, where identifying insider threats, access violations, and coordinated attacks is essential for ensuring system resilience. Graph-based approaches have become increasingly important for modeling entity interactions, yet most rely on homogeneous and static structures, which limits their ability to capture the heterogeneity and temporal evolution of real-world environments. Heterogeneous Graph Neural Networks (HGNNs) have emerged as a promising paradigm for anomaly detection by incorporating type-aware transformations and relation-sensitive aggregation, enabling more expressive modeling of complex cyber data. However, current research on HGNN-based anomaly detection remains fragmented, with diverse modeling strategies, limited comparative evaluation, and an absence of standardized benchmarks. To address this gap, we provide a comprehensive survey of HGNN-based anomaly detection methods in cybersecurity. We introduce a taxonomy that classifies approaches by anomaly type and graph dynamics, analyze representative models, and map them to key cybersecurity applications. We also review commonly used benchmark datasets and evaluation metrics, highlighting their strengths and limitations. Finally, we identify key open challenges related to modeling, data, and deployment, and outline promising directions for future research. This survey aims to establish a structured foundation for advancing HGNN-based anomaly detection toward scalable, interpretable, and practically deployable solutions.

2511.07280 2026-06-09 econ.GN cs.IR cs.LG q-fin.EC 版本更新

The Value of Personalized Recommendations: Evidence from Netflix

个性化推荐的价值:来自Netflix的证据

Kevin Zielnicki, Guy Aridor, Aurélien Bibaut, Allen Tran, Winston Chou, Nathan Kallus

发表机构 * Netflix Kellogg School of Management, Northwestern University(西北大学凯洛格管理学院)

AI总结 本文通过Netflix观众数据,构建离散选择模型评估个性化推荐的价值,发现替换推荐算法会降低用户参与度和消费多样性,且有效推荐主要来自精准定位而非机械曝光。

详情
AI中文摘要

个性化推荐系统塑造了用户在线选择的大部分内容,然而其针对性使得分离推荐价值和底层商品的价值具有挑战性。我们构建了一个嵌入推荐诱导效用、低秩异质性和灵活状态依赖的离散选择模型,并将其应用于Netflix的观众数据。我们利用推荐算法引入的异质性变化来识别并分别评估这些组成部分,同时恢复出无需模型的分流比率,以验证我们的结构模型。我们使用该模型评估了反事实场景,量化了个性化推荐产生的增量参与度。首先,我们显示,用矩阵分解或流行度为基础的算法取代当前推荐系统会导致参与度分别减少4%和12%,并降低消费多样性。其次,大多数推荐带来的消费增长来自于有效的定位,而非机械曝光,其中中等流行商品(而非广泛流行或非常小众商品)的收益最大。

英文摘要

Personalized recommendation systems shape much of user choice online, yet their targeted nature makes separating out the value of recommendation and the underlying goods challenging. We build a discrete choice model that embeds recommendation-induced utility, low-rank heterogeneity, and flexible state dependence and apply the model to viewership data at Netflix. We exploit idiosyncratic variation introduced by the recommendation algorithm to identify and separately value these components as well as to recover model-free diversion ratios that we can use to validate our structural model. We use the model to evaluate counterfactuals that quantify the incremental engagement generated by personalized recommendations. First, we show that replacing the current recommender system with a matrix factorization or popularity-based algorithm would lead to 4% and 12% reduction in engagement, respectively, and decreased consumption diversity. Second, most of the consumption increase from recommendations comes from effective targeting, not mechanical exposure, with the largest gains for mid-popularity goods (as opposed to broadly appealing or very niche goods).

2603.23640 2026-06-09 cs.DC cs.LG 版本更新

LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load

边缘侧的大语言模型推理:移动、NPU和GPU在持续负载下的性能效率权衡

Pranay Tummalapalli, Sahil Arayakandy, Ritam Pal, Kautuk Kundan

发表机构 * Conscious Engines

AI总结 研究评估了在持续负载下不同设备上大语言模型的性能效率,发现移动端受热管理限制,专用硬件受电池和内存带宽限制,展示了不同平台的推理表现和能效差异。

Comments 14 pages, 5 figures, 10 tables

详情
AI中文摘要

在设备上部署大语言模型以实现持续运行的个人代理,需要硬件在功率、热限和内存方面的持续推理。我们对Qwen 2.5 1.5B(4位量化)在四个平台上的性能进行了基准测试:Raspberry Pi 5搭载Hailo-10H NPU、三星Galaxy S24 Ultra、iPhone 16 Pro和NVIDIA RTX 4050 GPU笔记本电脑。使用固定258个标记的提示,经过20次预热迭代,我们测量了吞吐量、延迟、功率和热行为。对于移动平台,热管理超越峰值计算成为主要限制:iPhone 16 Pro在两次迭代内几乎失去一半的吞吐量,而S24 Ultra因操作系统强制的GPU频率限制导致推理终止。在专用硬件上,不同的限制主导:RTX 4050受电池电量限制,而Hailo-10H受模块内存带宽限制。RTX 4050在34.1 W下维持131.7 tok/s;Hailo-10H在不到2 W下维持6.9 tok/s,接近零波动,与RTX 4050在能效比例上相匹配,但吞吐量低19倍。结果应视为单个模型和提示类型的平台级部署特征,反映硬件和软件的结合,而非单独的硬件能力声明。

英文摘要

Deploying large language models on-device for always-on personal agents demands sustained inference from hardware tightly constrained in power, thermal envelope, and memory. We benchmark Qwen 2.5 1.5B (4-bit quantised) across four platforms: a Raspberry Pi 5 with Hailo-10H NPU, a Samsung Galaxy S24 Ultra, an iPhone 16 Pro, and a laptop NVIDIA RTX 4050 GPU. Using a fixed 258-token prompt over 20 warm-condition iterations per device, we measure throughput, latency, power, and thermal behaviour. For mobile platforms, thermal management supersedes peak compute as the primary constraint: the iPhone 16 Pro loses nearly half its throughput within two iterations, and the S24 Ultra suffers a hard OS-enforced GPU frequency floor that terminates inference entirely. On dedicated hardware, distinct constraints dominate: the RTX 4050 is bounded by its battery power ceiling, while the Hailo-10H is limited by on-module memory bandwidth. The RTX 4050 sustains 131.7 tok/s at 34.1 W; the Hailo-10H sustains 6.9 tok/s at under 2 W with near-zero variance, matching the RTX 4050 in energy proportionality at 19x lower throughput. Results should be interpreted as platform-level deployment characterisations for a single model and prompt type, reflecting hardware and software combined, rather than general claims about hardware capability alone.

2511.18493 2026-06-09 eess.IV cs.AI cs.CV 版本更新

SAGE: Shape-Adapting Gated Experts for Adaptive Histopathology Image Segmentation

SAGE:适应性组织病理图像分割的形状自适应门控专家

Gia Huy Thai, Hoang-Nguyen Vu, Anh-Minh Phan, Quang-Thinh Ly, Thi-Ngoc-Truc Nguyen, Nhat Ho

发表机构 * University of Science, VNU-HCM(越南国家大学科学学院) Trivita AI University of Technology, VNU-HCM(越南国家大学技术学院) Michigan State University, USA(美国密歇根州立大学) The University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 SAGE通过动态专家路由框架提升异构视觉网络中细胞形态变化的适应性,实现高精度分割与稳健泛化。

Comments Accepted to CVPR 2026 (Findings Track). Project Page: https://oxyzgiahuy.github.io/sage/

详情
AI中文摘要

细胞大小和形状的显著差异仍然是计算机辅助癌症检测在吉像素全滑片图像中的主要障碍,由于细胞异质性。当前的CNN-Transformer混合模型使用静态计算图和固定路由,导致额外计算并难以适应输入变化。我们提出形状自适应门控专家(SAGE),一种输入自适应框架,通过双路径设计和层次门控以及形状适应枢纽(SA-Hub)将静态骨干网络重新配置为动态路由专家架构。SAGE以ConvNeXt和Vision Transformer UNet(SAGE-ConvNeXt+ViT-UNet)实现,其在EBHI上达到95.23%的Dice分数,在GlaS Test A和Test B上分别达到92.78%和91.42%的DSC分数,并在DigestPath上达到91.26%的DSC分数,同时在分布偏移下表现出稳健的泛化能力,通过自适应平衡局部细化和全局上下文。SAGE建立了可扩展的动态专家路由基础,从而促进灵活的视觉推理。项目页面:https://oxyzgiahuy.github.io/sage/

英文摘要

The significant variability in cell size and shape continues to pose a major obstacle in computer-assisted cancer detection on gigapixel Whole Slide Images (WSIs), due to cellular heterogeneity. Current CNN-Transformer hybrids use static computation graphs with fixed routing. This leads to extra computation and makes it harder to adapt to changes in input. We propose Shape-Adapting Gated Experts (SAGE), an input-adaptive framework that enables dynamic expert routing in heterogeneous visual networks. SAGE reconfigures static backbones into dynamically routed expert architectures via a dual-path design with hierarchical gating and a Shape-Adapting Hub (SA-Hub) that harmonizes feature representations across convolutional and transformer modules. Embodied as SAGE with ConvNeXt and Vision Transformer UNet (SAGE-ConvNeXt+ViT-UNet), our model achieves a Dice score of 95.23% on EBHI, DSC scores of 92.78% and 91.42% on GlaS Test A and Test B, respectively, and 91.26% DSC at the WSI level on DigestPath, while exhibiting robust generalization under distribution shifts by adaptively balancing local refinement and global context. SAGE establishes a scalable foundation for dynamic expert routing in visual networks, thereby facilitating flexible visual reasoning. Project page: https://oxyzgiahuy.github.io/sage/

2601.01279 2026-06-09 econ.TH cs.AI cs.CE cs.CL cs.GT 版本更新

Supracompetitive Pricing Under AI Monoculture

人工智能单一群体下的超竞争定价

Shengyu Cao, Ming Hu

发表机构 * Rotman School of Management, University of Toronto(多伦多大学罗特曼管理学院)

AI总结 本文研究了在共享AI模型下,竞争卖家委托定价时可能产生的超竞争定价问题,通过双寡头模型分析发现,AI模型的鲁棒性和可重复性配置可能导致超竞争定价现象,且市场结果取决于初始定价倾向。

Comments 46 pages

详情
AI中文摘要

当竞争卖家将定价委托给共享的AI模型(如大型语言模型)时,相关推荐结合性能驱动的更新,聚合卖家反馈,引发一个问题:标准的AI部署实践是否会无意中产生超竞争定价?本文开发了一个简化的双寡头模型,其中两个卖家从共享的AI模型中获得定价推荐,该模型由两个参数特征化:一个倾向参数捕捉模型设置高价的倾向,一个输出保真度参数衡量该倾向与实际输出的一致性,其中倾向通过定期重新训练在观察到的结果上更新。我们发现,配置AI模型以鲁棒性和可重复性可以导致超竞争定价通过相变。在临界输出保真度阈值以下,竞争性定价是唯一的稳定结果。在临界值以上,模型表现出双稳态:竞争性和超竞争性定价都是局部稳定的,实际结果取决于模型的初始倾向。超竞争性定价提高了平均价格,但偶尔的低价推荐使检测变得复杂。对于完美输出保真度,任何内部初始倾向都会导致完全价格协调。对于有限训练批次大小为b,当初始倾向位于超竞争性盆地时,随着b的增加,超竞争性定价的概率接近1,不确定结果区域以O(1/√b)的速率缩小。任何减少模型倾向与卖家实际定价之间一致性的因素,无论是通过多样化AI供应商、引入推荐噪声还是减少卖家的遵守,都会将市场推向竞争性结果。

英文摘要

When competing sellers delegate pricing to a shared AI model, such as a large language model, correlated recommendations combined with performance-driven updates aggregating seller feedback raise a key question: can standard AI deployment practices inadvertently produce supracompetitive pricing? We develop a stylized duopoly model in which two sellers receive pricing recommendations from a shared AI characterized by two parameters: a propensity parameter capturing the model's tendency to set high prices and an output-fidelity parameter measuring alignment between this tendency and actual outputs, with propensity updated via periodic retraining on observed outcomes. We find that configuring AI models for robustness and reproducibility can lead to supracompetitive pricing via a phase transition. Below a critical output-fidelity threshold, competitive pricing is the unique stable outcome. Above it, the model exhibits bistability: both competitive and supracompetitive pricing are locally stable, with the realized outcome determined by the model's initial propensity. Supracompetitive pricing raises average prices, but occasional low-price recommendations complicate detection. With perfect output fidelity, full price coordination emerges from any interior initial propensity. For finite training batches of size $b$, when the initial propensity lies in the supracompetitive basin, the probability of supracompetitive pricing approaches 1 as $b$ increases, with the region of indeterminate outcomes shrinking at rate $O(1/\sqrt{b})$. Any factor reducing alignment between the model's propensity and sellers' actual pricing, whether through diversifying AI providers, introducing recommendation noise, or reducing seller adherence, pushes the market toward competitive outcomes.

2507.00260 2026-06-09 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

Disentangled Feature Importance

解耦特征重要性

Jin-Hong Du, Kathryn Roeder, Larry Wasserman

发表机构 * Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China(香港大学统计与精算科学系) Musketeers Foundation Institute of Data Science, The University of Hong Kong, Hong Kong SAR, China(香港大学数据科学穆斯克特基金会研究所) Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA(卡内基梅隆大学统计与数据科学系) Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA(卡内基梅隆大学计算生物学系) Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA(卡内基梅隆大学机器学习系)

AI总结 本文提出解耦特征重要性(DFI),用于解释相关测量通道中的预测信号分配,通过独立潜在表示和熵最优传输几何计算特征重要性,实现稳定且可解释的归因。

Comments 29 main and 44 supplementary pages

详情
AI中文摘要

当预测变量统计依赖时,特征重要性的适当定义取决于操作目标。条件增量措施适合于特征选择、获取和压缩,其中共享的预测信息被视为冗余。然而,对于事后解释,目标通常是将预测信号归因于相关测量通道。我们引入了解耦特征重要性(DFI),这是一种针对此设置的群体层面归因框架。DFI在指定的熵最优传输几何下将协变量映射到独立的潜在表示,计算潜在重要性,并通过巴里中心敏感度将重要性归因于原始协变量。我们证明了广泛的条件增量FI函数在平方误差损失下瞄准条件增量预测价值,因此回答了与依赖下的共享预测信号归因不同的问题。在固定传输成本、参考定律和正则化水平下,DFI定义了一个well-specified的估计量族。潜在分数具有功能ANOVA解释,并在高斯线性情况下,归因DFI恢复了相关回归器的经典R²分解。我们推导了在干扰率和光滑性条件下基于影响函数的推断,并在模拟和HIV-1中和抗性分析中展示了DFI在共享预测信号归因方面产生稳定、可解释、具有不确定性的归因。

英文摘要

When predictors are statistically dependent, the appropriate definition of feature importance depends on the operational goal. Conditional-incremental measures are well-suited for feature selection, acquisition, and compression, where shared predictive information is treated as redundancy. For post-hoc interpretation, however, the goal is often to attribute predictive signals across correlated measurement channels. We introduce Disentangled Feature Importance (DFI), a population-level attribution framework for this setting. DFI maps covariates to an independent latent representation under a specified entropic optimal transport geometry, computes latent importance, and attributes it back to the original covariates through barycentric sensitivities. We show that broad conditional-incremental FI functionals target conditional incremental predictive value under squared-error loss, and therefore answer a different question from attribution of shared predictive signal under dependence. Under fixed transport cost, reference law, and regularization level, DFI defines a well-specified family of estimands. Latent scores admit a functional ANOVA interpretation, and in the Gaussian linear case, the attributed DFI recovers the classical $R^2$ decomposition for correlated regressors. We derive influence-function-based inference under nuisance-rate and smoothness conditions, and show in simulations and an HIV-1 neutralization-resistance analysis that DFI yields stable, interpretable, uncertainty-quantified attributions of shared predictive signal.

2603.13679 2026-06-09 cs.HC cs.CV 版本更新

Toward Scalable Co-located Practical Learning: Assisting with Computer Vision and Multimodal Analytics

迈向可扩展的协同实践学习:协助计算机视觉和多模态分析

Xinyu Li, Linxuan Zhao, Yueqiao Jin, Yuchen Liu, Jin Zhou, Roberto Martinez-Maldonado, Dragan Gasevic, Lixiang Yan

发表机构 * Centre for Learning Analytics at Monash(墨尔本大学学习分析中心) Monash University(墨尔本大学) Department of Civil and Environmental Engineering(土木与环境工程系) School of Education(教育学院) The University of Hong Kong(香港大学)

AI总结 本研究评估了固定摄像头管道在重复护理模拟中的效果,通过多阶段源到目标适应提升行为检测精度,并利用行为轨迹分析提升模拟 debriefing 的可检索性。

详情
AI中文摘要

协同实践学习在患者周围留下可见动作、任务资源和房间区域的痕迹,但这些痕迹通常通过实时观察或回顾视频审查来恢复。固定广角视频可以减少传感负担,但 debriefing 管道必须做更多:不仅要检测行为,还要在小摄像头位置变化后维持检测,将检测器推导的行为轨迹与指导员标注的结果相关联,并保持房间区域上下文。本研究在重复护理模拟中评估了固定摄像头管道。使用统一的六代码分类法,我们测试了YOLO26目标-only训练和两阶段源到目标适应,跨两个相同房间侧视数据源。然后将检测结果从51个指导员标注的会话转换为每秒行为和行为区域轨迹,用于速率、有序网络、转换网络和序列分析。两阶段适应将2021目标视图的平均mAP50从0.815提升到0.848,从0.690提升到0.855对于较小的2022目标视图;在平衡的目标配额$N=22$下,2022模型达到0.850 mAP50。在检测器推导的行为轨迹分析中,更高的手机使用特征化低任务表现会话。区域标签改变了患者互动的解释:在更高表现会话中,主要患者护理区域互动更强,而在较低表现会话中,次级区域互动更强。有序和转换网络模型显示,有序房间区域关系超越了行为频率,最强的任务表现分类器使用了区域和共在特征。最终的轨迹最适合可检索的模拟 debriefing,其中指导员检查检测到的时刻,而不是接收自动评估分数。

英文摘要

Co-located practical learning leaves evidence in visible actions around patients, task resources and room zones, but these traces are often recovered through live observation or retrospective video review. Fixed wide-angle video could reduce sensing burden, yet a debriefing pipeline must do more than detect behaviours: it must maintain detection after small camera-position shifts, relate the detector-derived behaviour trace to instructor-labelled outcomes and preserve room-zone context. This study evaluates a fixed-camera pipeline in repeated nursing simulation. Using a harmonised six-code taxonomy, we tested YOLO26 target-only training and two-stage source-to-target adaptation across two same-room side-view data sources. We then converted detections from 51 instructor-labelled sessions into one-second behaviour and behaviour-zone traces for rate, ordered-network, transition-network and sequence analyses. Two-stage adaptation improved mean mAP50 from 0.815 to 0.848 for the 2021 target view and from 0.690 to 0.855 for the smaller 2022 target view; with a balanced target quota of \(N = 22\), the 2022 model reached 0.850 mAP50. In the detector-derived behaviour trace analyses, higher phone use characterised low task-performance sessions. Zone labels changed the interpretation of patient interaction: primary patient-care-zone interaction was stronger in higher-performance sessions, while secondary-zone interaction was stronger in lower-performance sessions. Ordered and transition network models showed that ordered room-zone relations contributed beyond behaviour frequency, with the strongest task-performance classifier using zoned and co-presence features. The resulting trace is most appropriate for searchable simulation debriefing, where instructors inspect detected moments rather than receive automated assessment scores.

2603.12046 2026-06-09 eess.AS cs.CV cs.SD 版本更新

Dr. SHAP-AV: Decoding Relative Modality Contributions via Shapley Attribution in Audio-Visual Speech Recognition

Dr. SHAP-AV:通过Shapley归因解码音频-视觉语音识别中的相对模态贡献

Umberto Cappellazzo, Stavros Petridis, Maja Pantic

发表机构 * Imperial College London, UK(伦敦帝国学院,英国) NatWest AI Research, UK(英国NatWest人工智能研究)

AI总结 本文提出Dr.SHAP-AV框架,通过Shapley值分析音频-视觉语音识别中模态贡献,揭示噪声环境下模型对视觉的依赖及音频贡献的稳定性,推动模态加权机制和Shapley归因作为标准诊断工具。

Comments Accepted to INTERSPEECH 2026 [Long Paper track]. Project website: https://umbertocappellazzo.github.io/Dr-SHAP-AV

详情
AI中文摘要

音频-视觉语音识别(AVSR)利用音频和视觉信息在噪声环境下实现鲁棒识别。然而,模型如何平衡这些模态仍不清楚。我们提出了Dr.SHAP-AV框架,利用Shapley值分析AVSR中的模态贡献。通过在两个基准测试中六个模型上进行实验,不同SNR水平下,我们引入三种分析:全局Shapley用于整体模态平衡,生成Shapley用于解码过程中的贡献动态,时间对齐Shapley用于输入-输出对应性。我们的发现表明,在噪声下模型倾向于依赖视觉,但在严重退化下仍保持高音频贡献。模态平衡在生成过程中演变,时间对齐在噪声下保持稳定,SNR是驱动模态权重的主要因素。这些发现揭示了持续的音频偏见,推动了定制化的模态加权机制和基于Shapley的归因作为标准AVSR诊断工具。

英文摘要

Audio-Visual Speech Recognition (AVSR) leverages both acoustic and visual information for robust recognition under noise. However, how models balance these modalities remains unclear. We present Dr. SHAP-AV, a framework using Shapley values to analyze modality contributions in AVSR. Through experiments on six models across two benchmarks and varying SNR levels, we introduce three analyses: Global SHAP for overall modality balance, Generative SHAP for contribution dynamics during decoding, and Temporal Alignment SHAP for input-output correspondence. Our findings reveal that models shift toward visual reliance under noise yet maintain high audio contributions even under severe degradation. Modality balance evolves during generation, temporal alignment holds under noise, and SNR is the dominant factor driving modality weighting. These findings expose a persistent audio bias, motivating ad-hoc modality-weighting mechanisms and Shapley-based attribution as a standard AVSR diagnostic.

2603.11669 2026-06-09 eess.AS cs.SD 版本更新

SEMamba++: A General Speech Restoration Framework Leveraging Global, Local, and Periodic Spectral Patterns

SEMamba++:一种利用全局、局部和周期性频谱模式的通用语音修复框架

Yongjoon Lee, Jung-Woo Choi

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)(韩国科学技术院)

AI总结 本文提出SEMamba++框架,通过整合全局、局部和周期性频谱特征,提升语音修复性能,同时保持计算效率。

Comments Accepted to Interspeech 2026 Long paper track. Project page: https://sites.google.com/view/semambapp

详情
AI中文摘要

通用语音修复需要能够解释复杂语音结构并在各种失真下工作的技术。虽然状态空间模型如SEMamba在语音去噪方面取得了进展,但它们并未针对关键语音特性如频谱周期性或多分辨率频率分析进行优化。在本文中,我们引入了一种架构,旨在整合语音特定的特征作为归纳偏置。特别是,我们提出了全局、局部和周期性(GLP)模块,一个有效的频率特征提取块,能够有效利用频率桶的属性。然后,我们设计了一个多分辨率并行时频双处理块以捕捉多样的频谱模式,并设计了一个可学习的映射以进一步提高模型性能。通过整合所有想法,所提出的SEMamba++在多个基线模型中表现最佳,同时保持计算效率。

英文摘要

General speech restoration demands techniques that can interpret complex speech structures under various distortions. While State-Space Models like SEMamba have advanced the state-of-the-art in speech denoising, they are not inherently optimized for critical speech characteristics, such as spectral periodicity or multi-resolution frequency analysis. In this work, we introduce an architecture tailored to incorporate speech-specific features as inductive biases. In particular, we propose the Global, Local, and Periodic (GLP) module, a frequency feature extraction block that effectively and efficiently leverages the properties of frequency bins. Then, we design a multi-resolution parallel time-frequency dual-processing block to capture diverse spectral patterns, and a learnable mapping to further enhance model performance. With all our ideas combined, the proposed SEMamba++ achieves the best performance among multiple baseline models while remaining computationally efficient.

2603.11250 2026-06-09 math.NA cs.LG cs.NA physics.flu-dyn 版本更新

A Machine Learning-Enhanced Hopf-Cole Formulation for Nonlinear Gas Flow in Porous Media

一种结合机器学习的Hopf-Cole公式用于多孔介质中非线性气体流动

V. S. Maduri, K. B. Nakshatrala

发表机构 * Department of Civil & Environmental Engineering University of Houston(土木与环境工程系 休斯顿大学) Computational & Applied Mechanics Laboratory(2026 计算与应用力学实验室)

AI总结 本文提出一种结合Klinkenberg增强本构关系、Hopf-Cole变换的混合线性方程组、共享树神经网络架构和DeepLS求解器的框架,用于多孔介质中气体传输的建模与反演,提升了压力依赖渗透率和滑移参数的估计精度。

详情
AI中文摘要

准确建模多孔介质中的气体流动对于许多技术应用至关重要,包括储层性能预测、碳捕集与封存以及燃料电池和电池。然而,此类建模仍具挑战性,因为存在强烈的非线性行为和模型参数的不确定性。特别是,由Klinkenberg模型描述的气体滑移效应引入了压力依赖的渗透率,这使数值模拟复杂化并掩盖了与经典达西流行为的偏差。为解决这些挑战,本文提出了一种整合的建模框架,结合了Klinkenberg增强的本构关系、Hopf-Cole变换的混合形式线性控制方程、共享树神经网络架构和Deep Least-Squares (DeepLS)求解器。Hopf-Cole变换将原始非线性流动方程重新表述为等价的线性系统,与达西模型密切相关。混合形式与共享树神经网络架构相结合,能够同时准确预测压力和速度场。进行了严格的收敛分析,理论和数值上都建立了所提出求解器的稳定性和收敛性。重要的是,所提出的框架还自然地促进了从有限或间接观测中反演压力依赖渗透率和滑移参数,从而能够高效估计难以实验测量的流动特性。数值结果展示了在广泛的压力范围内准确恢复流动动态和参数,突显了该框架在致密地层中气体传输建模和反演中的鲁棒性、准确性和计算效率。

英文摘要

Accurate modeling of gas flow through porous media is critical for many technological applications, including reservoir performance prediction, carbon capture and sequestration, and fuel cells and batteries. However, such modeling remains challenging due to strong nonlinear behavior and uncertainty in model parameters. In particular, gas slippage effects described by the Klinkenberg model introduce pressure-dependent permeability, which complicates numerical simulation and obscures deviations from classical Darcy flow behavior. To address these challenges, we present an integrated modeling framework for gas transport in porous media that combines a Klinkenberg-enhanced constitutive relation, Hopf-Cole-transformed mixed-form linear governing equations, a shared-trunk neural network architecture, and a Deep Least-Squares (DeepLS) solver. The Hopf-Cole transformation reformulates the original nonlinear flow equations into an equivalent linear system closely related to the Darcy model, while the mixed formulation, together with a shared-trunk neural architecture, enables simultaneous and accurate prediction of both pressure and velocity fields. A rigorous convergence analysis is performed both theoretically and numerically, establishing the stability and convergence properties of the proposed solver. Importantly, the proposed framework also naturally facilitates inverse modeling of pressure-dependent permeability and slippage parameters from limited or indirect observations, enabling efficient estimation of flow properties that are difficult to measure experimentally. Numerical results demonstrate accurate recovery of flow dynamics and parameters across a wide range of pressure regimes, highlighting the framework's robustness, accuracy, and computational efficiency for gas transport modeling and inversion in tight formations.

2603.10823 2026-06-09 stat.ML cs.LG 版本更新

ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning

ReTabSyn:通过强化学习实现真实表格数据合成

Xiaofeng Lin, Seungbae Kim, Zhuoya Li, Zachary DeSoto, Charles Fleming, Guang Cheng

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 ReTabSyn通过强化学习优先学习条件分布,提升小数据下表格数据合成效率,优于现有基线方法。

详情
AI中文摘要

深度生成模型可通过生成合成训练数据缓解数据稀缺和隐私问题,但在低数据、不平衡的表格设置中难以完全学习复杂的数据分布。我们认为追求完整的联合分布可能过于苛刻;为了提高数据效率,模型应优先学习条件分布$P(y\mid \bm{X})$,这由最近的理论分析所支持。因此,我们通过\textbf{ReTabSyn},一个提供合成器训练过程中特征相关性保留直接反馈的\textbf{Re}inforced \textbf{Tab}ular \textbf{Syn}thesis流程,克服了这一限制。这一目标鼓励生成器在数据有限时优先考虑最有用的预测信号,从而增强下游模型的实用性。我们通过这种做法对基于语言模型的生成器进行经验微调,并在具有小样本量、类别不平衡和分布偏移的基准测试中,ReTabSyn始终优于最先进的基线方法。此外,我们的方法可以轻松扩展到控制合成表格数据的各种方面,例如应用专家指定的生成观测约束。

英文摘要

Deep generative models can help with data scarcity and privacy by producing synthetic training data, but they struggle in low-data, imbalanced tabular settings to fully learn the complex data distribution. We argue that striving for the full joint distribution could be overkill; for greater data efficiency, models should prioritize learning the conditional distribution $P(y\mid \bm{X})$, as suggested by recent theoretical analysis. Therefore, we overcome this limitation with \textbf{ReTabSyn}, a \textbf{Re}inforced \textbf{Tab}ular \textbf{Syn}thesis pipeline that provides direct feedback on feature correlation preservation during synthesizer training. This objective encourages the generator to prioritize the most useful predictive signals when training data is limited, thereby strengthening downstream model utility. We empirically fine-tune a language model-based generator using this approach, and across benchmarks with small sample sizes, class imbalance, and distribution shift, ReTabSyn consistently outperforms state-of-the-art baselines. Moreover, our approach can be readily extended to control various aspects of synthetic tabular data, such as applying expert-specified constraints on generated observations.

2603.08977 2026-06-09 eess.AS cs.SD 版本更新

Universal Speech Content Factorization

通用语音内容分解

Henry Li Xinyuan, Zexin Cai, Lin Zhang, Leibny Paola García-Perera, Berrak Sisman, Sanjeev Khudanpur, Nicholas Andrews, Matthew Wiesner

发表机构 * Center for Language and Speech Processing, Johns Hopkins University, USA(约翰霍普金斯大学语言与语音处理中心) Human Language Technology Center of Excellence (COE), Johns Hopkins University, USA(约翰霍普金斯大学人类语言技术卓越中心(COE))

AI总结 本文提出USCF方法,通过线性可逆方法提取低秩语音表示,抑制说话人音色同时保留语音内容。该方法扩展了语音内容分解,通过最小二乘优化学习通用语音到内容映射,并从少量目标语音中推导出说话人特定转换。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

我们提出通用语音内容分解(USCF),一种简单且可逆的线性方法,用于提取低秩语音表示,在其中抑制说话人音色同时保留语音内容。USCF通过最小二乘优化学习通用语音到内容映射,扩展了语音内容分解,一种封闭集语音转换(VC)方法,到开放集设置。我们通过嵌入分析显示USCF有效去除说话人依赖性变化。作为零样本语音转换系统,USCF在可懂度、自然度和说话人相似性方面与需要大量目标说话人数据或额外神经网络训练的方法相媲美。最后,我们证明作为训练高效的音色分离语音特征,USCF特征可作为训练音色提示文本到语音模型的声学表示。语音样本和代码已公开提供。

英文摘要

We propose Universal Speech Content Factorization (USCF), a simple and invertible linear method for extracting a low-rank speech representation in which speaker timbre is suppressed while phonetic content is preserved. USCF extends Speech Content Factorization, a closed-set voice conversion (VC) method, to an open-set setting by learning a universal speech-to-content mapping via least-squares optimization and deriving speaker-specific transformations from only a few seconds of target speech. We show through embedding analysis that USCF effectively removes speaker-dependent variation. As a zero-shot VC system, USCF achieves competitive intelligibility, naturalness, and speaker similarity compared to methods that require substantially more target-speaker data or additional neural training. Finally, we demonstrate that as a training-efficient timbre-disentangled speech feature, USCF features can serve as the acoustic representation for training timbre-prompted text-to-speech models. Speech samples and code are publicly available.

2602.23234 2026-06-09 cs.IR cs.AI cs.LG 版本更新

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

扩展搜索相关性:用LLM生成的判断增强应用商店排名

Evangelia Christakopoulou, Vivekkumar Patel, Hemanth Velaga, Sandip Gaikwad, Sean Suchter, Venkat Sundaranatha

发表机构 * Apple(苹果公司)

AI总结 针对应用商店排名中专家文本相关性标签稀缺的问题,通过微调LLM生成数百万标签,结合行为相关性优化排序器,显著提升Pareto前沿和转化率。

详情
AI中文摘要

大规模商业搜索系统优化相关性以驱动成功的会话,帮助用户找到他们想要的内容。为了最大化相关性,我们利用两个互补的目标:行为相关性(用户倾向于点击或下载的结果)和文本相关性(结果与查询的语义匹配)。一个持续的挑战是,相对于丰富的行为相关性标签,专家提供的文本相关性标签稀缺。我们首先通过系统评估LLM配置来解决这个问题,发现一个专门的、微调的模型在提供高度相关的标签方面显著优于一个更大的预训练模型。使用这个最优模型作为力量倍增器,我们生成了数百万个文本相关性标签以克服数据稀缺性。我们展示了用这些文本相关性标签增强我们的生产排序器会导致Pareto前沿显著外移:离线NDCG在行为相关性上改善,同时在文本相关性上也提高。这些离线收益通过在全球应用商店排序器上的A/B测试得到验证,该测试显示转化率统计上显著提高了+0.24%,其中最大的性能提升出现在尾部查询中,新的文本相关性标签在缺乏可靠行为相关性标签时提供了稳健的信号。

英文摘要

Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find what they are looking for. To maximize relevance, we leverage two complementary objectives: behavioral relevance (results users tend to click or download) and textual relevance (a result's semantic fit to the query). A persistent challenge is the scarcity of expert-provided textual relevance labels relative to abundant behavioral relevance labels. We first address this by systematically evaluating LLM configurations, finding that a specialized, fine-tuned model significantly outperforms a much larger pre-trained one in providing highly relevant labels. Using this optimal model as a force multiplier, we generate millions of textual relevance labels to overcome the data scarcity. We show that augmenting our production ranker with these textual relevance labels leads to a significant outward shift of the Pareto frontier: offline NDCG improves for behavioral relevance while simultaneously increasing for textual relevance. These offline gains were validated by a worldwide A/B test on the App Store ranker, which demonstrated a statistically significant +0.24% increase in conversion rate, with the most substantial performance gains occurring in tail queries, where the new textual relevance labels provide a robust signal in the absence of reliable behavioral relevance labels.

2603.05026 2026-06-09 cs.SE cs.LG cs.MA 版本更新

RepoLaunch: Automating Build and Management of Code Repositories across Languages and Platforms

RepoLaunch:跨语言和平台的代码仓库构建与管理自动化

Kenan Li, Rongzhi Li, Linghao Zhang, Qirui Jin, Liao Zhu, Xiaosong Huang, Geng Zhang, Yikai Zhang, Shilin He, Chengxing Xie, Xin Zhang, Zijian Jin, Bowen Li, Chaoyun Zhang, Yu Kang, Yufan Huang, Elsie Nallipogu, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang

发表机构 * Microsoft(微软)

AI总结 RepoLaunch通过自动化依赖解析、编译和测试结果提取,提升了多语言多平台代码仓库的构建效率,其构建成功率达78%,并展示了全自动SWE数据集创建流程。

Comments Under peer review. 22 pages, 5 figures, 9 tables

详情
AI中文摘要

语言模型(LM)代理在自动化软件工程(SWE)中推动了显著进展,但大规模构建和测试软件仓库仍主要依赖人工操作。本文引入RepoLaunch,一种新的代理框架,能够自动解析依赖、编译源代码并提取测试结果,适用于多种编程语言和操作系统。RepoLaunch实现了78%的构建成功率,优于仅支持Python/Linux的先前系统18%。为展示其应用,我们进一步展示了由RepoLaunch驱动的全自动SWE数据集创建流水线,仅需在任务设计阶段进行人工输入。RepoLaunch已开源,其自动化任务生成流水线已被最近的代理基准测试和训练工作所采用。

英文摘要

Language model (LM) agents have driven substantial progress in automated software engineering (SWE), yet building and testing software repositories at scale remains a largely manual and labor-intensive bottleneck. In this work, we introduce RepoLaunch, a novel agentic framework that automatically resolves dependencies, compiles source code, and extracts test results across diverse programming languages and operating systems. RepoLaunch achieves a 78% build success rate, outperforming the Python/Linux-only prior system by 18%. To demonstrate its application, we further present a fully automated pipeline for SWE dataset creation driven by RepoLaunch, which only requires human input at the task-design stage. RepoLaunch is open-sourced, and its automated task-generation pipeline has already been adopted by several recent works on agentic benchmarking and training.

2603.04177 2026-06-09 cs.SE cs.AI cs.LG 版本更新

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

CodeTaste:LLM能否生成人类级别的代码重构?

Alex Thillen, Niels Mündler, Veselin Raychev, Martin Vechev

发表机构 * University of California, Berkeley(加州大学伯克利分校) ETH Zurich(苏黎世联邦理工学院)

AI总结 研究LLM代理在代码重构中的能力,通过CodeTaste基准测试发现,代理在详细指定重构时表现良好,但难以自主发现人类选择的重构,提出“先提议后实现”分解可改善对齐。

详情
AI中文摘要

LLM编码代理可以生成可工作的代码,但它们的解决方案往往积累复杂性、重复和架构债务。人类开发者通过重构来解决这些问题:行为保持的程序转换,改善结构和可维护性。我们研究代理是否(i)能够可靠地执行重构,以及(ii)识别人类开发者在实际代码库中实际选择的重构。为此,我们构建了CodeTaste,一个从大型多文件开源重构中挖掘的基准测试。为了评分解决方案,我们结合了测量功能正确性的仓库测试套件和定制的静态检查,这些检查使用数据流推理验证不期望模式的移除和期望模式的引入。我们的结果显示了一个明显的差距:代理在实现详细指定的重构时表现良好,但当给定变更的关注区域时,往往无法发现人类的重构选择。先提议后实现的分解改善了对齐,而在实现之前选择最佳对齐的提议可以带来进一步的收益。CodeTaste为在现实代码库中将编码代理与人类重构决策对齐提供了评估目标和潜在的偏好信号。我们发布了基准测试、排行榜和代码。

英文摘要

LLM coding agents can generate working code, but their solutions often accumulate complexity, duplication, and architectural debt. Human developers address such issues through refactoring: behavior-preserving program transformations that improve structure and maintainability. We investigate whether agents (i) can execute refactorings reliably and (ii) identify the refactorings that human developers actually chose in real codebases. To this end, we construct CodeTaste, a benchmark mined from large multi-file open-source refactorings. To score solutions, we combine repository test suites that measure functional correctness with tailored static checks that verify removal of undesired and introduction of desired code patterns using dataflow reasoning. Our results show a clear gap: agents perform well at implementing refactorings that are specified in detail, but often fail to discover the human refactoring choices when given a focus area for changes. A propose-then-implement decomposition improves alignment, and selecting the best-aligned proposal before implementation can yield further gains. CodeTaste provides an evaluation target and a potential preference signal for aligning coding agents with human refactoring decisions in realistic codebases. We release the benchmark, leaderboard, and code.

2510.16028 2026-06-09 cs.CR cs.AI cs.LG cs.SY eess.SY 版本更新

TAO: Tolerance-Aware Optimistic Verification for Floating-Point Neural Networks

TAO:面向浮点神经网络的容忍感知乐观验证

Jianzhu Yao, Hongxu Su, Taobo Liao, Zerui Cheng, Huan Zhang, Xuechao Wang, Pramod Viswanath

发表机构 * Princeton University(普林斯顿大学) HKUST (GZ)(香港科技大学(广州)) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出TAO协议,通过算子级容忍区域和Merkle锚定的争议游戏,在不依赖可信硬件或确定性内核的情况下验证浮点神经网络输出,开销仅0.3%。

Comments 18 pages, 8 figures

详情
Journal ref
Proceedings of the 21st European Conference on Computer Systems, (2026) 1515-1532
AI中文摘要

神经网络越来越多地在用户无法控制的硬件上运行(云GPU、推理市场)。然而,机器学习即服务很少透露实际运行的内容或返回的输出是否忠实反映预期输入。用户无法对服务降级(模型交换、量化、图重写或诸如修改广告嵌入等差异)进行追索。验证输出很困难,因为异构加速器上的浮点执行本质上是不确定的。现有方法要么对实际浮点神经网络不实用,要么重新引入供应商信任。我们提出TAO:一种容忍感知乐观验证协议,它接受在原则性算子级接受区域内的输出,而不是要求逐位相等。TAO结合了两种误差模型:(i)每个算子的IEEE-754最坏情况界限和(ii)跨硬件校准的紧密经验百分位分布。差异触发一个Merkle锚定的、阈值引导的争议游戏,该游戏递归地划分计算图,直到剩下一个算子,此时裁决简化为轻量级理论界限检查或针对经验阈值的小型诚实多数投票。未受挑战的结果在挑战窗口后最终确定,无需可信硬件或确定性内核。我们将TAO实现为PyTorch兼容运行时和当前部署在以太坊Holesky测试网上的合约层。运行时检测图、计算每个算子的界限,并在FP32中运行未经修改的供应商内核,开销可忽略(Qwen3-8B上为0.3%)。在A100、H100、RTX6000、RTX4090上的CNN、Transformer和扩散模型中,经验阈值比理论界限紧10^2-10^3倍,且考虑界限的对抗攻击成功率为0%。总之,TAO为现实世界的异构ML计算协调了可扩展性和可验证性。

英文摘要

Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present TAO: a Tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. TAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement TAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. Together, TAO reconciles scalability with verifiability for real-world heterogeneous ML compute.