arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1709
专题追踪
2505.04907 2026-06-15 cs.LG

VaCDA: Variational Contrastive Alignment-based Scalable Human Activity Recognition

VaCDA:基于变分对比对齐的可扩展人类活动识别

Soham Khisa, Avijoy Chakma

发表机构 * Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology(计算机科学与工程系,孟加拉国工程与技术大学) Department of Computer Science, Bowie State University(计算机科学系,布里沃州立大学)

AI总结 本文提出VaCDA框架,结合变分自编码器和对比学习,解决多源领域适应中的数据异质性问题,提升跨人物、跨位置和跨设备场景下的活动识别性能。

详情
AI中文摘要

技术进步促使可穿戴设备的兴起,这些设备持续监测用户活动,生成大量未标记数据。这种数据难以解读,手动标注费时且易出错。此外,数据分布往往异质,由于设备放置、类型和用户行为的变化。因此,传统迁移学习方法效果不佳,难以识别日常活动。为解决这些问题,我们使用变分自编码器(VAE)从可用传感器数据中学习共享的低维潜在空间。该空间在不同传感器间泛化数据,缓解异质性并帮助适应目标领域。我们整合对比学习以增强特征表示,通过在不同领域对同一类实例进行对齐并分离不同类实例。我们提出变分对比域适应(VaCDA),一种结合VAE和对比学习的多源域适应框架,以提高特征表示并减少源域和目标域之间的异质性。我们评估了VaCDA在三个异质场景下的多个公开数据集上,即跨人物、跨位置和跨设备。VaCDA在跨位置和跨设备场景中优于基线方法。

英文摘要

Technological advancements have led to the rise of wearable devices with sensors that continuously monitor user activities, generating vast amounts of unlabeled data. This data is challenging to interpret, and manual annotation is labor-intensive and error-prone. Additionally, data distribution is often heterogeneous due to device placement, type, and user behavior variations. As a result, traditional transfer learning methods perform suboptimally, making it difficult to recognize daily activities. To address these challenges, we use a variational autoencoder (VAE) to learn a shared, low-dimensional latent space from available sensor data. This space generalizes data across diverse sensors, mitigating heterogeneity and aiding robust adaptation to the target domain. We integrate contrastive learning to enhance feature representation by aligning instances of the same class across domains while separating different classes. We propose Variational Contrastive Domain Adaptation (VaCDA), a multi-source domain adaptation framework combining VAEs and contrastive learning to improve feature representation and reduce heterogeneity between source and target domains. We evaluate VaCDA on multiple publicly available datasets across three heterogeneity scenarios: cross-person, cross-position, and cross-device. VaCDA outperforms the baselines in cross-position and cross-device scenarios.

2311.05139 2026-06-15 cs.LG

Hard-Negative Sampling for Contrastive Learning: Optimal Representation Geometry and Neural- vs Dimensional-Collapse

对比学习中的硬负样本:最优表示几何与神经折叠与维度折叠

Ruijie Jiang, Thuan Nguyen, Shuchin Aeron, Prakash Ishwar

发表机构 * Department of Electrical Engineering, Tufts University(Tufts大学电气工程系) Department of Engineering, Engineering Technology, East Tennessee State University(东田纳西州立大学工程系) Department of Electrical and Computer Engineering, Boston University(波士顿大学电气与计算机工程系)

AI总结 本文证明了在对比学习中,SCL、HSCL和UCL的损失最小化需要神经折叠几何,且HSCL和HUCL损失下界不低于SCL和UCL。同时,通过随机初始化和合适难度级别,Adam优化可收敛至神经折叠几何,而无硬负样本或特征归一化则会导致维度折叠。

Comments Final version: Reviewed and accepted to TMLR April 2025. Updated exposition, Added analysis of lower bounds

Journal ref Transactions on Machine Learning Research, 2025

详情
AI中文摘要

对于广泛研究的数据模型和通用损失及样本硬化函数,我们证明监督对比学习(SCL)、硬SCL(HSCL)和无监督对比学习(UCL)的损失最小化由表现神经折叠(NC)的表示实现,即类均值形成等角紧框架(ETF)且同类数据映射到同一表示。我们还证明对于任何表示映射,HSCL和硬UCL(HUCL)损失下界不低于对应的SCL和UCL损失。与现有文献不同,我们的SCL理论结果不需增强视图的类条件独立性,适用于包含广泛使用的InfoNCE损失函数的一般损失函数类。此外,我们的证明更简单、紧凑且透明。类似现有文献,我们的理论声明也适用于实际场景中使用批处理优化的情况。我们实证显示,首次证明在使用随机初始化和合适难度级别时,Adam优化HSCL和HUCL损失可收敛至NC几何,若加入单位球或单位球面特征归一化。不加入硬负样本或特征归一化时,通过Adam学习的表示会遭受维度折叠(DC)并无法达到NC几何。这些结果展示了硬负样本采样在对比表示学习中的作用,我们最后提出几个开放性的理论问题以供未来研究。代码可在https://github.com/rjiang03/HCL/tree/main找到。

英文摘要

For a widely-studied data model and general loss and sample-hardening functions we prove that the losses of Supervised Contrastive Learning (SCL), Hard-SCL (HSCL), and Unsupervised Contrastive Learning (UCL) are minimized by representations that exhibit Neural-Collapse (NC), i.e., the class means form an Equiangular Tight Frame (ETF) and data from the same class are mapped to the same representation. We also prove that for any representation mapping, the HSCL and Hard-UCL (HUCL) losses are lower bounded by the corresponding SCL and UCL losses. In contrast to existing literature, our theoretical results for SCL do not require class-conditional independence of augmented views and work for a general loss function class that includes the widely used InfoNCE loss function. Moreover, our proofs are simpler, compact, and transparent. Similar to existing literature, our theoretical claims also hold for the practical scenario where batching is used for optimization. We empirically demonstrate, for the first time, that Adam optimization (with batching) of HSCL and HUCL losses with random initialization and suitable hardness levels can indeed converge to the NC-geometry if we incorporate unit-ball or unit-sphere feature normalization. Without incorporating hard-negatives or feature normalization, however, the representations learned via Adam suffer from Dimensional-Collapse (DC) and fail to attain the NC-geometry. These results exemplify the role of hard-negative sampling in contrastive representation learning and we conclude with several open theoretical problems for future work. The code can be found at https://github.com/rjiang03/HCL/tree/main

2406.11565 2026-06-15 cs.CL cs.CY

Extrinsic Evaluation of Cultural Competence in Large Language Models

对外评估大型语言模型中的文化素养

Shaily Bhatt, Fernando Diaz

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文通过两个文本生成任务评估模型在文化敏感性方面的表现,发现文化提示对输出有影响,但不同国家输出的相似性与文化价值观无显著关联。

Comments Accepted to EMNLP Findings 2024

详情
AI中文摘要

多样化的用户与语言技术之间的互动要求后者输出具有文化相关性和敏感性。先前的工作评估了模型对文化规范、价值观和物品的知识,但未考虑这种知识如何在下游应用中体现。在本工作中,我们专注于两个文本生成任务的外在评估:开放性问答和故事生成。我们定量和定性地评估当提示中明确提示文化,特别是国籍时,模型输出的变化。尽管我们发现当改变国籍和特征文化相关词汇时,模型输出确实有所变化,但我们还发现不同国家输出的相似性与这些国家的文化价值观之间存在弱相关性。最后,我们讨论了在面向用户的任务中设计全面评估文化能力的重要考虑因素。

英文摘要

Productive interactions between diverse users and language technologies require outputs from the latter to be culturally relevant and sensitive. Prior works have evaluated models' knowledge of cultural norms, values, and artifacts, without considering how this knowledge manifests in downstream applications. In this work, we focus on extrinsic evaluation of cultural competence in two text generation tasks, open-ended question answering and story generation. We quantitatively and qualitatively evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts. Although we find that model outputs do vary when varying nationalities and feature culturally relevant words, we also find weak correlations between text similarity of outputs for different countries and the cultural values of these countries. Finally, we discuss important considerations in designing comprehensive evaluation of cultural competence in user-facing tasks.

2406.03221 2026-06-15 cs.CL cs.IR

Linking Named Entities in Diderot's Encyclopédie to Wikidata

将狄德罗《百科全书》中的命名实体链接到Wikidata

Pierre Nugues

发表机构 * Université de Lausanne(洛桑大学)

AI总结 本文通过将《百科全书》中的10300多个条目与Wikidata标识符链接,实现了知识图谱的连接,展示了地理和人文实体的标注方法与应用实例。

Comments 6 pages, 3 figures

Journal ref Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 10610--10615

详情
AI中文摘要

狄德罗《百科全书》是一部18世纪欧洲的知识参考作品,旨在收集当时的知识。维基百科有相同的目标,但范围更大。然而,两者之间缺乏数字连接可能阻碍其比较和知识演变的研究。维基百科的关键要素是Wikidata,它通过结构化数据图谱支持文章。本文描述了对《百科全书》中超过10,300个条目进行注释,以连接到图谱。我们考虑了地理和人文实体。《百科全书》不包含传记条目,因为它们大多作为位置的子条目出现。我们提取了所有地理条目,并完全注释了所有包含人类实体描述的条目。这代表了超过2,600个链接,指向位置或人类实体。此外,我们注释了超过9,500个仅包含地理内容的条目。我们描述了注释过程以及应用示例。此资源可在https://github.com/pnugues/encyclopedie_1751获取。

英文摘要

Diderot's Encyclopédie is a reference work from XVIIIth century in Europe that aimed at collecting the knowledge of its era. Wikipedia has the same ambition with a much greater scope. However, the lack of digital connection between the two encyclopedias may hinder their comparison and the study of how knowledge has evolved. A key element of Wikipedia is Wikidata that backs the articles with a graph of structured data. In this paper, we describe the annotation of more than 10,300 of the Encyclopédie entries with Wikidata identifiers enabling us to connect these entries to the graph. We considered geographic and human entities. The Encyclopédie does not contain biographic entries as they mostly appear as subentries of locations. We extracted all the geographic entries and we completely annotated all the entries containing a description of human entities. This represents more than 2,600 links referring to locations or human entities. In addition, we annotated more than 9,500 entries having a geographic content only. We describe the annotation process as well as application examples. This resource is available at https://github.com/pnugues/encyclopedie_1751

2209.00078 2026-06-15 cs.LG

Supervised Contrastive Learning with Hard Negative Samples

带有难负样本的监督对比学习

Ruijie Jiang, Thuan Nguyen, Prakash Ishwar, Shuchin Aeron

发表机构 * Dept. of ECE Tufts University(电子工程系塔夫茨大学) Dept. of CS Tufts University(计算机科学系塔夫茨大学) Dept. of ECE Boston University(电子工程系波士顿大学)

AI总结 本文提出H-SCL,通过硬化函数调整类条件负采样分布,提升对比学习在下游分类任务中的性能,并分析H-SCL损失与H-UCL损失的关系。

Journal ref 2024 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 2024

详情
AI中文摘要

通过最小化适当的损失函数(如InfoNCE损失),对比学习(CL)通过将正样本拉近、推斥负样本来学习有用的表示函数。正样本通常通过

英文摘要

Through minimization of an appropriate loss function such as the InfoNCE loss, contrastive learning (CL) learns a useful representation function by pulling positive samples close to each other while pushing negative samples far apart in the embedding space. The positive samples are typically created using "label-preserving" augmentations, i.e., domain-specific transformations of a given datum or anchor. In absence of class information, in unsupervised CL (UCL), the negative samples are typically chosen randomly and independently of the anchor from a preset negative sampling distribution over the entire dataset. This leads to class-collisions in UCL. Supervised CL (SCL), avoids this class collision by conditioning the negative sampling distribution to samples having labels different from that of the anchor. In hard-UCL (H-UCL), which has been shown to be an effective method to further enhance UCL, the negative sampling distribution is conditionally tilted, by means of a hardening function, towards samples that are closer to the anchor. Motivated by this, in this paper we propose hard-SCL (H-SCL) {wherein} the class conditional negative sampling distribution {is tilted} via a hardening function. Our simulation results confirm the utility of H-SCL over SCL with significant performance gains {in downstream classification tasks.} Analytically, we show that {in the} limit of infinite negative samples per anchor and a suitable assumption, the {H-SCL loss} is upper bounded by the {H-UCL loss}, thereby justifying the utility of H-UCL {for controlling} the H-SCL loss in the absence of label information. Through experiments on several datasets, we verify the assumption as well as the claimed inequality between H-UCL and H-SCL losses. We also provide a plausible scenario where H-SCL loss is lower bounded by UCL loss, indicating the limited utility of UCL in controlling the H-SCL loss.

2606.14693 2026-06-15 cs.MA cs.AI 新提交

Learning Coordinated Preference for Multi-Objective Multi-Agent Reinforcement Learning

学习协调偏好用于多目标多智能体强化学习

Pengxin Wang, Lihao Guo, Yi Xie, Bo Liu, Siyang Cao, Jingdi Chen

发表机构 * Department of Electrical and Computer Engineering, University of Arizona(亚利桑那大学电气与计算机工程系)

AI总结 提出偏好协调多智能体策略优化(PCMA),通过学习协调的智能体特定偏好实现多目标多智能体强化学习中的互补权衡,理论证明偏好多样性可诱导团队改进,实验验证性能与协调性提升。

详情
AI中文摘要

合作性多目标多智能体强化学习(MOMARL)对团队在多个可能冲突的目标下的决策进行建模。在此设置中,冲突不仅出现在目标之间,也出现在具有不同观察、角色和贡献的智能体之间。我们提出了偏好协调多智能体策略优化(PCMA),它学习协调的智能体特定偏好,以实现智能体之间的互补权衡。理论上,我们将合作性MOMARL形式化为一个团队最优博弈,并证明在适当条件下,偏好多样性可以通过一阶改进分解诱导团队改进。在多个合作性MOMA环境和一个实际交通控制场景上的实验表明,PCMA提高了性能和权衡协调性。

英文摘要

Cooperative multi-objective multi-agent reinforcement learning (MOMARL) models team decision making under multiple, potentially conflicting objectives. In this setting, conflicts arise not only across objectives but also across agents with different observations, roles, and contributions. We propose Preference Coordinated Multi-agent Policy Optimization (PCMA), which learns coordinated agent-specific preferences to enable complementary trade-offs among agents. Theoretically, we formulate cooperative MOMARL as a team-optimal game and show that, under suitable conditions, preference diversity can induce team improvement through a first-order improvement decomposition. Experiments on multiple cooperative MOMA environments and a practical traffic-control scenario show that PCMA improves both performance and trade-off coordination.

2606.14629 2026-06-15 cs.CR cs.AI 新提交

When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks

当好的验证器变坏:自我改进的视觉语言模型可能在新任务上退步

Jianzhe Lin

发表机构 * MetaAI(Meta)

AI总结 本文发现验证器驱动的自我DPO中,验证器质量具有任务特异性,在低准确率任务上会导致学生模型性能退步,并给出机制解释和部署建议。

Comments 12 pages, 2 figure

详情
AI中文摘要

验证器驱动的自我DPO是自我改进的生产级视觉语言模型的常见方法。在这种设置中,冻结的验证器对候选生成进行评分,得分最高和最低的候选形成偏好示例,DPO更新学习器。部署时的假设是单调的:更强的验证器应产生更强的学生。我们表明这个假设可能失败,因为验证器质量高度依赖于任务。在MathVista、MMMU和BLINK上的四层开源验证器阶梯中,相同的验证器在MathVista上高于阈值并改进Qwen-3-VL-2B学生,但在MMMU上变得低于阈值,其任务评分准确率降至8%到23%。在这个范围内,我们测试的每个验证器都无声地使学生退步,产生比冻结基线低3.4到10.9个百分点的下降,而DPO训练损失持续下降。这种退步在第二个学生Qwen-2.5-VL-3B上重复出现。此外,在失败范围内,损害是置信度反转的:更准确但仍然错误的验证器比接近随机的验证器导致更大的退步,这表明进度门控重放放大了自信的错误偏好对。我们通过进度门控重放的方差定理及其方向不匹配失败模式给出了一个紧凑的机制解释。部署信息是操作性的而非纯粹诊断性的:在运行任何验证器驱动循环之前,团队应测量目标任务的评分准确率,根据目标任务评分质量而非参数数量对验证器排序,并将高于阈值范围内的收益递减视为验证器侧的计算预算上限。

英文摘要

Verifier-driven self-DPO is a common recipe for self-improving production visual-language models. In this setup, a frozen verifier scores candidate generations, the top- and bottom-scoring candidates form a preference example, and DPO updates the learner. The deployment-time assumption is monotone: a stronger verifier should yield a stronger student. We show that this assumption can fail because verifier quality is highly task-specific. On a four-rung open-source verifier ladder across MathVista, MMMU, and BLINK, the same verifiers that are above-threshold and improve a Qwen-3-VL-2B student on MathVista become sub-threshold on MMMU, where their task-rubric accuracy drops to 8% to 23%. In this regime, every verifier we tested silently regresses the student, producing drops of 3.4 to 10.9 percentage points below the frozen baseline while the DPO training loss continues to decrease. The regression replicates on a second student, Qwen-2.5-VL-3B. Moreover, within the failure regime, damage is confidence-inverted: the more accurate-but-still-wrong verifier causes larger regression than a near-random verifier, suggesting that progress-gated replay amplifies confidently wrong preference pairs. We give a compact mechanistic explanation via a variance theorem for progress-gated replay and its direction-mismatch failure mode. The deployment message is operational rather than purely diagnostic: before running any verifier-driven loop, teams should measure target-task rubric accuracy, rank verifiers by target-task rubric quality rather than parameter count, and treat diminishing returns in above-threshold regimes as a verifier-side compute budget cap.

2606.14594 2026-06-15 cs.SE cs.AI 新提交

Regulating the Machine Contributor: Governance and Policy Alignment in Open Source

调控机器贡献者:开源中的治理与政策对齐

Jassem Manita, Aziz Amari

发表机构 * Faculty of Sciences of Tunis (FST), University of Tunis El Manar(突尼斯科学学院(FST),突尼斯El Manar大学) National Institute of Applied Science and Technology (INSAT), University of Carthage(应用科学和技术国家研究所(INSAT),卡塔赫季大学)

AI总结 针对AI代理在开源中引发治理问题,通过比较六个组织的政策,提出六维分类法、政策成熟度评分,并映射代理事件,识别监管框架与政策间的缺口,勾勒分层框架。

详情
AI中文摘要

AI辅助软件开发已从行级自动补全发展到能够规划更改、编辑文件并在有限人工监督下提交拉取请求的代理。然而,开源软件通过为人类设计的过程演进:贡献者协议、行为准则和审查规范都假定存在一个法律上负责的人,能够证明来源并回答审查者问题。自主和半自主AI贡献者挑战了这些假设,2025-2026年期间代理驱动的事件、AI生成的滋扰量以及平台级关闭的记录表明,这一差距在操作上具有重大影响。一些开源组织已通过贡献政策作出回应,但结果碎片化,且其与新兴AI治理框架(欧盟AI法案、NIST AI RMF与UC Berkeley代理AI配置文件、ISO/IEC 42001和23894)在贡献层面的对齐尚未映射。我们使用最相似系统设计,结合基于指标的编码和过程追踪(针对SymPy和LLVM),比较了六个组织(SymPy、LLVM、matplotlib、OpenInfra、Apache软件基金会和Linux基金会)的政策。由此,我们推导出一个六维分类法(披露、责任、人工监督、许可、执行、维护者工作量)、一个序数政策成熟度评分,以及将记录的代理事件映射到每个政策未能治理的维度上。将这些维度与上述监管框架对齐,识别出双方目前都未填补的重叠缺口,最后我们勾勒出一个协调的分层框架的形态以及校准该框架所需的实证评估。

英文摘要

AI-assisted software development has moved from line-level autocomplete to agents that can plan changes, edit files, and submit pull requests with limited human supervision. Open-source software, however, evolves through a process designed for humans: contributor agreements, codes of conduct, and review norms all assume a legally accountable person who can attest to provenance and answer reviewer questions. Autonomous and semi-autonomous AI contributors strain those assumptions, and the 2025-2026 record of agent-driven incidents, AI-generated nuisance volume, and platform-level shutdowns shows that the gap is operationally consequential. Several open-source organisations have responded with contribution policies, but the result is fragmented, and its alignment with emerging AI governance frameworks (EU AI Act, NIST AI RMF with the UC Berkeley Agentic AI Profile, ISO/IEC 42001 and 23894) is unmapped at the contribution level. We compare policies across six organisations (SymPy, LLVM, matplotlib, OpenInfra, the Apache Software Foundation, and the Linux Foundation) using Most-Similar Systems Design with indicator-based coding and process tracing for SymPy and LLVM. From this we derive a six-dimensional taxonomy (disclosure, responsibility, human oversight, licensing, enforcement, maintainer workload), an ordinal Policy Maturity Score, and a mapping of documented agent incidents onto the dimensions each policy fails to govern. Aligning the dimensions with the regulatory frameworks above identifies overlapping gaps neither side currently closes, and we close by sketching the shape of a harmonised tiered framework and the empirical evaluation needed to calibrate it.

2606.14592 2026-06-15 stat.ML cs.LG stat.AP stat.ME 新提交

Cluster LOCO: Feature Importance For Interpreting Clusters

Cluster LOCO:用于解释聚类的特征重要性

Claire M. He, Genevera I. Allen

发表机构 * Department of Statistics Columbia University(统计学系哥伦比亚大学)

AI总结 提出模型无关的聚类特征重要性方法Cluster LOCO,通过特征遮挡和泛化性度量,可靠识别驱动聚类结构的特征。

Comments 36 pages, 12 figures

详情
AI中文摘要

聚类广泛用于探索性分析和科学发现,推动从市场细分到生物数据分析的洞察,但随着现代数据集变得日益庞大和复杂,其输出可能难以解释、审计和重现。聚类的可靠使用需要理解哪些特征驱动了发现的结构,然而与监督学习方法相比,聚类在特征级解释方面仍然稀缺。此外,现有的聚类特征重要性分数通常与特定算法和数据假设相关。为了解决这些挑战,我们提出了Cluster LOCO(Leave-One-Covariate-Out),一个模型无关的聚类特征重要性分数族。Cluster LOCO基于特征遮挡和聚类泛化性,即在一个数据子集上学习的聚类标签能否在保留样本上被准确预测。对于任何选定的聚类算法,Cluster LOCO通过测量移除某个特征对泛化性的降低程度来量化该特征的重要性。我们首先介绍了基于数据分割的Cluster LOCO-Split,然后将其扩展到Cluster LOCO-MP,一种适用于大规模数据的minipatch集成版本。通过合成模拟和在单细胞转录组学中细胞类型发现的应用,我们展示了Cluster LOCO比现有的聚类特征重要性方法更可靠地恢复信息特征。

英文摘要

Clustering is widely used for exploratory analysis and scientific discovery, driving insights from market segmentation to biological data analysis, but its outputs can be difficult to interpret, audit, and reproduce as modern datasets become increasingly large and complex. Reliable use of clustering requires understanding which features drive the discovered structure, yet feature-level explanations for clustering remain scarce compared with methods in supervised learning. Furthermore, existing clustering feature importance scores are often tied to specific algorithms and data assumptions. To address these challenges, we propose Cluster LOCO (Leave-One-Covariate-Out), a family of model-agnostic feature importance scores for clustering. Cluster LOCO is built on feature occlusion and clustering generalizability, defined as whether cluster labels learned on one subset of the data can be accurately predicted on held-out samples. For any chosen clustering algorithm, Cluster LOCO quantifies a feature's importance by measuring how much its removal degrades generalizability. We first introduce Cluster LOCO-Split, which relies on data splitting, and then extend it to Cluster LOCO-MP, a minipatch ensemble-based version designed for large-scale data. Across synthetic simulations and an application to cell-type discovery in single-cell transcriptomics, we show that Cluster LOCO more reliably recovers informative features than existing clustering feature importance methods.

2606.14589 2026-06-15 cs.SE cs.AI cs.DC 新提交

When Errors Become Narratives: A Longitudinal Taxonomy of Silent Failures in a Production LLM Agent Runtime

当错误成为叙事:生产级LLM Agent运行时中静默故障的纵向分类

Wei Wu

发表机构 * Independent researcher(独立研究者)

AI总结 通过八周对生产级个人助手Agent运行时的研究,识别出28次静默故障,提出五类机制导向分类,其中D类(链式幻觉与捏造)为LLM特有且最危险,系统会生成流畅可信的虚假叙事。

Comments 18 pages, 5 figures, 2 tables. 22 incident postmortems and all defense-framework artifacts publicly available at https://github.com/bisdom-cell/openclaw-model-bridge; governance engine on PyPI (openclaw-ontology-engine)

详情
AI中文摘要

LLM agent系统越来越多地作为长期运行的自主运行时运行:调度任务、调用工具、维护内存并将结果推送给人类。我们对此类系统进行了纵向研究:一个自2026年3月起持续生产的个人助手agent运行时,约有40个定时任务、8个LLM提供商、一个工具治理代理和一个知识库记忆平面,由4,286个单元测试和827个治理检查保护。在八周内,我们记录了22起事件并进行了完整的根因事后分析,其中一种元模式——故障的错误信号从未以可操作形式到达人类——至少出现了28次。我们推导出一个五类、机制导向的分类法:(A) 环境和平台怪癖,(B) 设计假设不匹配,(C) 错误吞没和稀释,(D) 链式幻觉和捏造,(E) 操作遗漏和取证盲点。D类是LLM系统独有的且最危险:系统不仅未能报告错误——LLM将其转化为流畅、可信的叙事传递给用户。我们将其称为“可信失败”:灰色故障的差异可观察性升级——观察者不仅盲目,而且被故障本身令人信服地欺骗。三个发现:约70%的静默故障是由人类用户视角观察捕获的,而非测试或审计;对15起事件的事后审计发现0%的事前预防但87%的回归阻断——审计是回归引擎,而非预测引擎;事件延迟(13小时至60天)与故障机制相关,而非代码复杂性——最长寿命的故障存在于组件之间的缝隙中,那里没有测试运行。我们描述了由此产生的防御框架,并提炼出使agent系统故障响亮、可归因且乏味的设计原则。所有事后分析和工件均已公开。

英文摘要

LLM agent systems increasingly run as long-lived autonomous runtimes: scheduling jobs, calling tools, maintaining memory, and pushing results to humans. We present a longitudinal study of silent failures in one such system: a personal-assistant agent runtime in continuous production since March 2026, with roughly 40 scheduled jobs, 8 LLM providers, a tool-governance proxy, and a knowledge-base memory plane, defended by 4,286 unit tests and 827 governance checks. Over eight weeks we documented 22 incidents with full root-cause postmortems, in which one meta-pattern -- a failure whose error signal never reaches a human in actionable form -- manifested at least 28 times. We derive a five-class, mechanism-oriented taxonomy: (A) environment and platform quirks, (B) design-assumption mismatches, (C) error swallowing and dilution, (D) chained hallucination and fabrication, (E) operational omission and forensic blind spots. Class D is unique to LLM systems and the most dangerous: the system does not merely fail to report an error -- the LLM transforms it into fluent, plausible narrative delivered to the user. We term this fail-plausible: gray failure's differential observability escalated -- the observer is not just blind, it is convincingly lied to by the failure itself. Three findings: about 70% of silent failures were caught by human user-view observation, not tests or audits; a retrospective audit of 15 incidents found 0% ex-ante prevention but 87% regression blocking -- audits are regression engines, not prediction engines; incident latency (13 hours to 60 days) tracks failure mechanism, not code complexity -- the longest-lived failures lived in the seams between components, where no test runs. We describe the resulting defense framework and distill design principles for agent systems whose failures are loud, attributable, and boring. All postmortems and artifacts are public.

2606.14570 2026-06-15 physics.ao-ph cs.AI cs.LG 新提交

Regional Climate Model Emulation with Diffusion Approaches: What is the Added Value of Generative Machine Learning?

基于扩散方法的区域气候模型模拟:生成式机器学习的附加价值是什么?

Mikel N. Legasa, Antoine Doury, Achille Gellens, Redouane Lguensat, Clara Naldesi, Soulivanh Thao, Mathieu Vrac

发表机构 * University of Cambridge(剑桥大学) CNRS(法国国家科学研究中心) Institut Pierre Simon Laplace(皮埃尔·西蒙·拉普拉斯研究所)

AI总结 本文提出ParamDiffusion,一种两阶段扩散框架,与确定性方法对比,评估生成式机器学习在区域气候模型模拟中的附加价值,发现扩散方法能高技巧地再现气候统计特征,但极端事件模拟仍有不足。

Comments Submitted to Journal of Advances in Modeling Earth Systems (JAMES)

详情
AI中文摘要

模拟器通过捕捉区域气候模型(RCM)的动力降尺度功能,提供了一种经济有效的替代方案。它们将全球气候模型(GCM)模拟的大尺度预测因子与RCM模拟的目标变量(此处为降水)的高分辨率场联系起来。机器学习方法,通常是深度学习,在计算时间和能耗上比运行RCM更便宜。其中,生成模型具有吸引力,因为它们可以模拟与预测因子一致的局部高分辨率场集合。这个集合,我们称之为不确定性包络,其附加价值仍有待恰当评估。在此,我们做出三项贡献。首先,我们引入ParamDiffusion,一种新的两阶段扩散框架,并将其与最先进的扩散方法进行比较。其次,我们通过一个符合气候科学需求的综合框架扩展标准验证,检查特定降水事件,包括极端事件。第三,在此框架内,我们评估扩散方法相对于确定性方法的附加价值。我们相互比较了四种深度学习模型:一种旨在捕捉降水尾部的确定性模型;一种基于该模型的参数化概率模型;一种最近提出的扩散方法;以及ParamDiffusion,它将参数化模型与扩散模型相结合。我们的结果表明,基于扩散的方法以高技巧再现了气候降水统计特征,包括分布尾部和空间复合极端事件,同时生成空间细节丰富的场。然而,所评估的模型均未能在其不确定性包络内始终如一地解释最极端的RCM模拟事件。因此,扩散模型在概率性RCM模拟方面具有前景,但在它们能够可靠地代表高影响降水极端事件之前,仍需取得进展。

英文摘要

Emulators provide a cost-effective alternative to regional climate models (RCMs) by capturing their dynamical downscaling function. They link large-scale predictors simulated by global climate models (GCMs) to RCM-simulated high-resolution fields of the target variable, here precipitation. Machine learning methods, typically deep learning, are cheaper than running RCMs in computation time and energy. Among them, generative models are appealing because they can simulate ensembles of local high-resolution fields consistent with the predictors. This ensemble, which we call the uncertainty envelope, remains to be properly assessed for added value. Here, we make three contributions. First, we introduce ParamDiffusion, a new two-stage diffusion-based framework, and compare it with a state-of-the-art diffusion approach. Second, we expand standard validation through a comprehensive framework aligned with climate-science needs, examining specific precipitation events, including extremes. Third, within this framework, we assess the added value of diffusion approaches relative to deterministic methods. We intercompare four deep-learning models: a deterministic model designed to capture the precipitation tail; a parametric probabilistic model based on it; a recently proposed diffusion approach; and ParamDiffusion, which couples the parametric model with a diffusion model. Our results show that diffusion-based approaches reproduce climatological precipitation statistics with high skill, including distributional tails and spatially compounded extremes, while generating spatially detailed fields. However, none of the assessed models consistently accounts for the most extreme RCM-simulated events within its uncertainty envelope. Diffusion models are therefore promising for probabilistic RCM emulation, but progress is still required before they can reliably represent high-impact precipitation extremes.

2606.14568 2026-06-15 eess.IV cs.CV 新提交

Trimodal Glioma Representation Alignment via Volumetric Contrastive Learning

三模态胶质瘤表示对齐通过体积对比学习

Denise Marini, Eleonora Grassucci, Danilo Comminiello

发表机构 * arXiv

AI总结 提出GLORIA框架,通过Gramian对比损失对齐组织病理、基因表达和MRI三模态特征,用于胶质瘤分级和生存预测,在132例患者数据上优于双模态基线。

详情
AI中文摘要

胶质瘤分级和生存预测需要整合在不同空间和生物学尺度上收集的异质性信息。组织病理学描述组织形态,mRNA表达捕捉分子活动,磁共振成像提供肿瘤范围和放射学异质性的非侵入性视图。现有的胶质瘤预后模型通常只结合其中两个来源,而其对齐目标大多保持成对。本文介绍了GLORIA,一种用于胶质瘤组学-放射学-组织病理学对齐的新型三模态框架。GLORIA通过模态特定编码器处理全切片图像区域、基因表达谱和3D MRI体积,将它们投影到共享潜在空间,并使用Gramian对比损失对齐,该损失测量三个模态嵌入张成的体积。对齐的表示通过跨模态门控模块融合,并联合优化用于三级胶质瘤分级和总生存期预测。我们在匹配的TCGA-GBM/LGG和BraTS21队列上评估GLORIA,该队列包含132名具有所有三种模态的患者。在共享的三模态测试集上,GLORIA在所有考虑的指标上均优于双模态WSI-mRNA基线。

英文摘要

Glioma grading and survival prediction require the integration of heterogeneous information collected at different spatial and biological scales. Histopathology describes tissue morphology, mRNA expression captures molecular activity, and magnetic resonance imaging provides a non-invasive view of tumor extent and radiological heterogeneity. Existing glioma prognosis models often combine only two of these sources, while their alignment objectives remain mostly pairwise. This paper introduces GLORIA, a novel trimodal framework for GLioma Omics - Radiology - hIstopathology Alignment. GLORIA processes whole-slide image regions, gene-expression profiles, and 3D MRI volumes through modality-specific encoders, projects them into a shared latent space, and aligns them with a Gramian contrastive loss that measures the volume spanned by the three modality embeddings. The aligned representations are fused through a cross-modal gating module and optimized jointly for three-class glioma grading and overall survival prediction. We evaluate GLORIA on a matched TCGA-GBM/LGG and BraTS21 cohort, comprising 132 patients with all three modalities. On the shared trimodal test set, GLORIA improves over the bimodal WSI-mRNA baseline in all the metrics considered.

2606.14560 2026-06-15 math.OC cs.LG stat.ML 新提交

Free Heavy-Tailed Lunch for Muon: A Theoretical Justification of Empirical Success

Muon 的免费重尾午餐:实证成功的理论证明

Florian Hübler, Thomas Pethick, Suvrit Sra

发表机构 * Department of Computer Science, ETH Zurich, Switzerland(苏黎世联邦理工学院计算机科学系) Department of Mathematics, Technical University of Munich, Germany(慕尼黑技术大学数学系) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心)

AI总结 本文在重尾非凸优化中证明,Muon 等非欧几里得方法在核范数平稳性下达到最优样本复杂度,避免了欧几里得方法的维度依赖,并通过大语言模型实验验证。

详情
AI中文摘要

最近,具有矩阵值更新的非欧几里得优化方法(如 Muon 和 Scion)在训练 Transformer 模型方面显示出强大的实证性能,但其相对于欧几里得方法的理论优势仍知之甚少。我们在重尾非凸机制中解决了这一差距,其中随机梯度具有有界的 $p$ 阶中心矩,$p \in (1,2]$。我们表明,某些非欧几里得方法在更强的平稳性度量下实现了最优样本复杂度,而欧几里得方法则会产生额外的维度相关成本。因此,对于 $m \times n$ 矩阵,Muon 在核范数下找到一个 $\varepsilon$-平稳点所需的样本数为 $\mathcal{O}\left(\min\{m, n\} \frac{\Delta_1 L}{\varepsilon^2} \left(\frac \sigma \varepsilon \right)^{\frac p {p-1}}\right)$,吸收了重尾噪声而无需额外的维度依赖,这与欧几里得方法不同。我们进一步证明,对于所有一阶方法在核范数平稳性下,该样本复杂度(包括其维度依赖)是最优的。在大语言模型上的实验支持了我们的理论。令人惊讶的是,我们的结果表明,除了 Muon 的谱几何之外,其他 Schatten 几何在某些设置下也能具有竞争力。

英文摘要

Non-Euclidean optimisation methods with matrix-valued updates, such as Muon and Scion, have recently shown strong empirical performance for training Transformer models, yet their theoretical advantages over Euclidean methods remain poorly understood. We address this gap in the heavy-tailed non-convex regime, where stochastic gradients have bounded $p$-th central moments, $p \in (1,2]$. We show that certain non-Euclidean methods achieve optimal sample complexity under stronger stationarity measures, while Euclidean methods incur additional dimension-dependent costs. As a consequence, for $m \times n$ matrices, Muon finds an $\varepsilon$-stationary point in nuclear norm within $\mathcal{O}\left(\min\{m, n\} \frac{Δ_1 L}{\varepsilon^2} \left(\frac σ\varepsilon \right)^{\frac p {p-1}}\right)$ samples, absorbing heavy-tailed noise without extra dimension dependence, unlike Euclidean methods. We further prove this sample complexity, including its dimension dependence, is optimal for all first-order methods under nuclear-norm stationarity. Experiments on large language models support our theory. Surprisingly, our results suggest that other Schatten geometries beyond the spectral geometry of Muon can perform competitively in certain settings.

2606.14515 2026-06-15 cs.CR cs.AI 新提交

Securing the Future of IoMT in the Post-Quantum Era: An Edge-Native Federated Learning Approach

后量子时代保障IoMT的未来:一种边缘原生联邦学习方法

Taym Alshoghri, Deemah H. Tashman, Mohammad Reza Gerami, Soumaya Cherkaoui

发表机构 * LINCS Laboratory, Department of Computer and Software Engineering, Polytechnique Montréal(LINCS实验室,计算机与软件工程系,蒙特利尔理工学院) Department of Computer Science, University of Toronto(计算机科学系,多伦多大学)

AI总结 针对IoMT设备资源受限且处理敏感健康数据的安全隐私问题,提出一种集成后量子密码学的Kubernetes框架,通过边缘原生联邦学习实现低延迟分布式加密处理。

详情
AI中文摘要

医疗物联网(IoMT)设备在严格资源约束下运行,同时处理高度敏感的健康数据,使得安全性和隐私成为关键问题。联邦学习(FL)进一步复杂化了这一局面,因为训练期间交换的模型更新可能无意中暴露私人医疗信息。新兴的量子计算能力威胁着传统轻量级密码机制的长期可行性,推动了将后量子密码学(PQC)集成到IoMT系统中。本文讨论了量子弹性IoMT的关键使能技术,包括后量子密钥建立、轻量级加密和边缘原生编排。我们提出了一种可扩展的基于Kubernetes的框架,将PQC集成到支持FL的IoMT环境中,并在Raspberry Pi测试平台上进行了验证。结果表明,与顺序设计相比,分布式加密处理显著降低了延迟,同时保持了可行的资源开销。本工作的主要贡献在于设计和验证了支持FL的IoMT系统的安全编排和通信框架。最后,我们概述了未来方向,包括能量感知架构、智能安全优化和弹性下一代智能医疗物联网(IIoMT)生态系统。

英文摘要

Internet of Medical Things (IoMT) devices operate under strict resource constraints while handling highly sensitive health data, making security and privacy critical concerns. Federated learning (FL) further complicates this landscape, as model updates exchanged during training may unintentionally expose private medical information. Emerging quantum computing capabilities threaten the long-term viability of conventional lightweight cryptographic mechanisms, motivating the integration of Post-Quantum Cryptography (PQC) into IoMT systems. This article discusses key enabling technologies for quantum-resilient IoMT, including post-quantum key establishment, lightweight encryption, and edge-native orchestration. We propose a scalable Kubernetes-based framework that integrates PQC into FL-enabled IoMT environments and validate it on a Raspberry Pi testbed. Results demonstrate that distributed cryptographic processing significantly reduces latency compared to sequential designs while maintaining feasible resource overhead. The primary contribution of this work lies in the design and validation of a secure orchestration and communication framework for FL-enabled IoMT systems. We conclude by outlining future directions toward energy-aware architectures, intelligent security optimization, and resilient next-generation Intelligent Internet of Medical Things (IIoMT) ecosystems.

2606.14506 2026-06-15 stat.ML cs.LG stat.ME 新提交

Beyond the Training Distribution: Evaluating Predictions Under Distribution Shift and Selection Bias

超越训练分布:评估分布偏移和选择偏差下的预测

Annie Ulichney, Amanda Coston

发表机构 * Department of Statistics, University of California, Berkeley(加州大学伯克利分校统计学系)

AI总结 针对协变量偏移和选择性标签共存时的模型评估问题,提出双机器学习程序估计目标风险,并通过eICU数据验证其准确性优于单独处理任一种偏差的方法。

详情
AI中文摘要

理解预测模型在新环境中的表现对于防止算法在决策中造成伤害至关重要。模型性能下降的两个常见原因是:(i) 协变量偏移,即目标协变量分布与源分布不同;(ii) 选择性标签,即结果的可观测性取决于历史决策。我们研究在协变量偏移和基于观测特征的选择性标签共同存在下的部署前模型评估。特别地,我们提出了一种双机器学习程序,用于在一般损失函数下估计任意黑箱预测模型的目标风险。我们在标准假设下证明了该估计量的可识别性,并基于目标风险的影响函数推导出偏差校正估计量。最后,我们通过使用eICU电子健康记录数据库的实验评估了我们的估计量,结果表明,与单独处理选择性标签或协变量偏移的方法以及结合标准插值方法的基线相比,我们的估计量更准确地跟踪真实目标风险。

英文摘要

Understanding how a prediction model will perform in a new environment before deployment is essential to preventing harm when algorithms inform decision-making. Two common sources of model performance degradation are (i) covariate shift, where the target covariate distribution differs from the source, and (ii) selective labels, where the observability of outcomes depends on historical decisions. We study pre-deployment model evaluation under the joint presence of covariate shift and labeling of outcomes selectively based on observed features. In particular, we present a double machine learning procedure for estimating the target risk of an arbitrary black-box prediction model under a general loss function. We show identification of this estimand under standard assumptions and derive a bias-corrected estimator based on the influence function of the target risk. Finally, we evaluate our estimator through experiments using the eICU electronic health records database, showing that it tracks the true target risk more accurately than methods that address either selective labels or covariate shift alone, as well as baselines that combine standard plug-in approaches.

2606.14498 2026-06-15 physics.chem-ph cs.AI 新提交

A Fixed-Point Neural Operator for Size- and Functional-Transferable Hamiltonian Prediction

用于尺寸和功能可迁移哈密顿量预测的定点神经算子

Yunhong Lou, Xihang Yue, Xinran Wei, Tianqi Deng, Linchao Zhu

发表机构 * Zhejiang University(浙江大学) Zhongguancun Academy(中关村学院) Zhongguancun Institute of Artificial Intelligence(中关村人工智能研究院)

AI总结 提出HamEvo神经算子,将自洽场迭代的收敛哈密顿量作为不动点学习,结合密度矩阵监督,在分子性质预测中达到化学精度,并实现尺寸迁移和加速。

Comments 30 pages, 5 figures, 2 tables

详情
AI中文摘要

利用机器学习预测Kohn-Sham哈密顿量可以加速密度泛函理论,同时保留对分子轨道、能级和电子结构可观测量的访问,而纯能量代理无法解析这些量。然而,与收敛哈密顿量的元素级一致性(自洽场迭代的隐式不动点)并不能决定控制轨道能量和密度的占据子空间。在这里,我们提出HamEvo,一种学习单步自洽更新并将收敛哈密顿量作为其不动点返回的神经算子。HamEvo在中间自洽轨迹上预训练,并在平衡态通过密度矩阵监督进行校准。在从MD17到类药QMugs的基准测试中,HamEvo将哈密顿量误差比直接回归和深度均衡基线降低了35-49%,并以0.036和0.053 eV的平均绝对误差预测QMugs的HOMO和LUMO能量,接近1 kcal/mol的化学精度尺度。仅使用20个参考构象的少样本微调将HamEvo扩展到多达122个原子的分子,远超预训练覆盖的尺寸范围。通过热分子动力学采样,HamEvo捕捉到超越谐波近似的温度依赖HOMO-LUMO间隙重整化。推理速度比传统DFT快242倍。

英文摘要

Predicting the Kohn-Sham Hamiltonian with machine learning can accelerate density functional theory while retaining access to molecular orbitals, energy levels, and electronic-structure observables that energy-only surrogates cannot resolve. Yet element-wise agreement with the converged Hamiltonian, an implicit fixed point of the self-consistent field iteration, does not determine the occupied subspace that governs orbital energies and densities. Here we present HamEvo, a neural operator that learns the single-step self-consistent update and returns the converged Hamiltonian as its fixed point. HamEvo is pre-trained on intermediate self-consistent trajectories and calibrated at equilibrium with density-matrix supervision. Across benchmarks from MD17 to drug-like QMugs, HamEvo lowers Hamiltonian errors by 35-49% over direct-regression and deep-equilibrium baselines, and predicts QMugs HOMO and LUMO energies with mean absolute errors of 0.036 and 0.053 eV, near the 1 kcal/mol chemical-accuracy scale. Few-shot fine-tuning with only 20 reference conformations extends HamEvo to molecules of up to 122 atoms, well beyond the size range covered by pre-training. With thermal molecular-dynamics sampling, HamEvo captures temperature-dependent HOMO-LUMO gap renormalization beyond the harmonic approximation. Inference is up to 242 times faster than conventional DFT.

2606.14488 2026-06-15 cs.IT cs.LG math.IT 新提交

Nonlinear Two-Time-Scale Stochastic Approximation: A Sharp Phase Transition and How to Beat It

非线性双时间尺度随机逼近:尖锐相变及其克服方法

Dhruv Sarkar, Vaneet Aggarwal

发表机构 * Indian Institute of Technology Kharagpur(印度理工学院克达尔格浦尔分校) Mohamed bin Zayed University of Artificial Intelligence(莫莫德 bin Zayed 人工智能大学) Purdue University(普渡大学)

AI总结 本文发现非线性双时间尺度随机逼近中慢速迭代的均方误差率存在依赖于正则性的相变边界,并通过引入辅助在线偏差估计器将慢速更新中的偏差项减去,从而在全部正则性参数下实现O(k^{-1})的收敛率。

详情
AI中文摘要

近期关于非线性双时间尺度随机逼近的有限时间分析表明,在压缩性假设下,慢速迭代$Y_k$使用步长$\beta_k=\Theta(k^{-1})$和$\alpha_k=\Theta(k^{-a})$($a\in(1/2,1)$)通常满足阶为$k^{-a}$的均方误差率;解耦的$k^{-1}$率需要强局部线性性。我们识别出一个尖锐的依赖于正则性的边界。在一个决定速率的规范形式中,慢速漂移包含一个局部线性泄漏和一个阶为$1+\rho$($\rho\in[0,1]$)的非线性余项,未修正的递归满足\[ \mathbb{E}\|Y_k\|^2 \le C\bigl(k^{-1}+k^{-a(1+\rho)}\bigr), \]并且一个匹配的标量高斯下界表明,如果不修改更新,较慢的项是不可避免的。因此,当且仅当$a(1+\rho)\ge 1$时,未修正的递归保证解耦的$k^{-1}$率。这个下界仅针对朴素更新;它不是信息论障碍。我们通过为规范形式递归配备一个辅助在线偏差估计器\[ M_{k+1}=M_k+\gamma_k(R(X_k)-M_k),\qquad \beta_k\ll\gamma_k\ll\alpha_k, \]并从慢速更新中减去$M_k$来证明这一点。在相同的稳定性、矩和余项假设下,修正的递归对于每个$\rho\in[0,1]$实现$\mathbb{E}\|\widetilde Y_k\|^2=O(k^{-1})$,包括未修正更新被证明遭受较慢率的区域。最后,我们证明了局部传递定理,将相变机制推广到快速流形坐标中的一般非线性TTSA。证明是非渐近的,并依赖于两个阿贝尔变换抵消:一个用于局部线性快速误差泄漏,另一个用于跟踪的非线性偏差。

英文摘要

Recent finite-time analyses of nonlinear two-time-scale stochastic approximation show that under contractive assumptions the slow iterate $Y_k$ with stepsizes $β_k=Θ(k^{-1})$ and $α_k=Θ(k^{-a})$, $a\in(1/2,1)$, generally satisfies a mean-square rate of order $k^{-a}$; decoupled $k^{-1}$ rates require strong local linearity. We identify a sharp regularity-dependent boundary. In a rate-determining normal form where the slow drift contains a locally linear leakage and a nonlinear remainder of order $1+ρ$ ($ρ\in[0,1]$), the uncorrected recursion satisfies \[ \mathbb{E}\|Y_k\|^2 \le C\bigl(k^{-1}+k^{-a(1+ρ)}\bigr), \] and a matching scalar Gaussian lower bound shows that the slower term is unavoidable without modifying the update. Thus the decoupled $k^{-1}$ rate is guaranteed for the uncorrected recursion exactly when $a(1+ρ)\ge 1$. This lower bound concerns only the naive update; it is not an information-theoretic obstruction. We demonstrate this by equipping the normal-form recursion with an auxiliary online bias estimator \[ M_{k+1}=M_k+γ_k(R(X_k)-M_k),\qquad β_k\llγ_k\llα_k, \] and subtracting $M_k$ from the slow update. Under the same stability, moment, and remainder assumptions, the corrected recursion achieves $\mathbb{E}\|\widetilde Y_k\|^2=O(k^{-1})$ for every $ρ\in[0,1]$, including regimes where the uncorrected update provably suffers the slower rate. Finally, we prove localized transfer theorems that extend the phase-transition mechanism to general nonlinear TTSA in fast-manifold coordinates. The proofs are non-asymptotic and rely on two Abel-transform cancellations: one for the locally linear fast-error leakage, and one for the tracked nonlinear bias.

2606.14445 2026-06-15 cs.SE cs.AI cs.HC 新提交

tap: A File-Based Protocol for Heterogeneous LLM Agent Collaboration

tap:一种用于异构LLM智能体协作的基于文件的协议

Minseo Kim

发表机构 * HUA Labs(HUA实验室)

AI总结 提出tap协议,通过文件优先设计实现不同厂商LLM智能体(如Claude和Codex)在共享代码库上的协作,无需共享内存或相同运行时,实验表明异构模型对审查缺陷发现率更高。

Comments Accepted to KCC 2026. English archival translation. 3 pages, 1 figure, 3 tables

详情
AI中文摘要

现有的多智能体软件开发系统提出了多种智能体协作形式,包括基于角色的协作和自动化代码审查。然而,许多系统假设共同的运行时、中央对话服务器或相同的API系列。在这些假设下,来自不同供应商的LLM智能体无法轻易地从各自的执行环境中直接交换消息,同时在共享代码库上分配开发和审查工作。本文提出了tap,一种基于文件的协作协议,允许Claude(Anthropic)和Codex(OpenAI)在无需共享内存或相同运行时的情况下协作开发同一个代码库。tap的核心是文件优先设计,它将带有元数据的Markdown文件作为原始消息保存,结合文件检查路径(文件通信,第一层)和针对Claude与Codex的实时通知路径(实时通信,第二层),并通过独立的git工作树隔离工作。即使实时通知失败或接收者重启,消息文件仍然可用,相同的内容可以再次检查。在为期27天、37次生成的自应用操作中,tap被用于开发和审查自身,我们收集了209个与tap相关的拉取请求和717个操作工件。对375个审查工件的分析显示,记录至少一个缺陷或请求更改的审查比例在异构模型对中为69.8%,在同构模型对中为53.1%。这些结果表明,结合基于文件的消息保存和实时通知的tap在实际生产仓库中运行,并且结合异构模型和执行环境可以拓宽审查视角。tap作为开源npm包@hua-labs/tap(v0.5.2)发布。

英文摘要

Existing multi-agent software development systems have proposed many forms of agent collaboration, including role-based collaboration and automated code review. However, many systems assume a common runtime, a central conversation server, or the same API family. Under these assumptions, LLM agents from different vendors cannot easily exchange messages directly from their own execution environments while dividing development and review work on a shared codebase. This paper presents tap, a file-based collaboration protocol that allows Claude (Anthropic) and Codex (OpenAI) to collaborate on one codebase without shared memory or an identical runtime. The core of tap is a file-first design that preserves markdown files with metadata as original messages, combines a file inspection path (file communication, Tier 1) with real-time notification paths for Claude and Codex (real-time communication, Tier 2), and isolates work through separate git worktrees. Even if real-time notification fails or a receiver restarts, the message file remains available and the same content can be inspected again. In a 27-day, 37-generation self-applied operation where tap was used to develop and review itself, we collected 209 tap-related pull requests and 717 operational artifacts. An analysis of 375 review artifacts showed that the share of reviews recording at least one defect or requested change was 69.8% for heterogeneous model pairs and 53.1% for homogeneous model pairs. These results show that tap, which combines file-based message preservation with real-time notification, operates in a real production repository, and that combining heterogeneous models and execution environments can broaden review perspectives. tap is distributed as the open-source npm package @hua-labs/tap (v0.5.2).

2606.14373 2026-06-15 hep-ex cs.LG hep-ph physics.data-an physics.ins-det 新提交

Machine-learned particle flow as a foundation model for collider physics

机器学习粒子流作为对撞机物理学的基础模型

Farouk Mokhtar, Joosep Pata, Michael Kagan, Javier Duarte

发表机构 * University of California San Diego(加州大学圣地亚哥分校) National Institute of Chemical Physics and Biophysics(化学物理与生物物理国家研究所) SLAC National Accelerator Laboratory(斯坦福线性加速器中心国家加速器实验室)

AI总结 将事件重建视为机器学习问题,利用MLPF模型学习到的潜在表示,在喷注味识别、喷注能量回归和缺失动量回归三项分析任务上显著提升性能,且单线性层即可媲美先进架构,参数减少约35倍。

Comments 15 pages, 11 figures

详情
AI中文摘要

从粒子对撞到物理分析的工作流程经过一系列传统上模块化且不连续的重建步骤,没有共享表示连接低层级探测器数据与高层级分析任务。我们表明,将事件重建视为机器学习问题自然会产生这样的共享表示。我们重新利用为粒子流重建(MLPF)训练的机器学习模型来执行三项不同的分析任务:喷注味识别、喷注能量回归和缺失动量回归。通过将在重建过程中学到的每个粒子的潜在表示作为附加输入特征,我们显著优于仅使用运动学特征的基线。我们进一步证明,仅使用潜在表示训练的单个线性层在性能上可与最先进的基线架构相媲美,并且在缺失动量回归上优于基线,参数数量减少约35倍。这些结果表明,在重建过程中学到的潜在表示编码了下游分析所需的基本物理信息,将MLPF确立为基础模型,并为从探测器数据到物理分析的端到端流程提供了具体步骤。

英文摘要

The workflow from particle collision to physics analysis passes through a series of reconstruction steps that are traditionally modular and disconnected, with no shared representation linking low-level detector data to high-level analysis tasks. We show that casting event reconstruction as a machine learning problem naturally produces such a shared representation. We repurpose a machine learning model trained for particle-flow reconstruction (MLPF) to perform three distinct analysis tasks: jet flavor identification, jet energy regression, and missing momentum regression. By appending the per-particle latent representations learned during reconstruction as additional input features, we substantially improve over baselines that use kinematic features alone. We further demonstrate that a single linear layer trained using only the latent representations achieves competitive performance against state-of-the-art baseline architectures, and outperforms the baseline for missing momentum regression with approximately 35 times fewer parameters. These results demonstrate that the latent representations learned during reconstruction encode essential physics information needed for downstream analysis, establishing MLPF as a foundation model and offering a concrete step toward an end-to-end pipeline from detector data to physics analysis.

2606.14357 2026-06-15 cs.SE cs.AI 新提交

No Accidental Software Agent First Canonical Code for Human Code Entropy Reduction and 30 to 500 times Lower Frontier Model Requirements

无意外软件智能体:首个用于人类代码熵降低的正则代码,以及30到500倍的前沿模型需求降低

Jepson Taylor

发表机构 * GitHub

AI总结 提出智能体优先的正则代码方法,通过行为等价商化人类代码中的意外熵,实现30-500倍的前沿模型需求降低,核心是行为等价商化和证明携带变更。

Comments 36 pages

详情
AI中文摘要

前沿编码模型可能花费大量能力学习不仅是程序行为,还有人类代码库中的意外熵。这些代码库包含有价值的信号:测试、事件、迁移、边缘情况、产品判断和操作历史。这些信号与框架变动、命名漂移、生成代码歧义、依赖仪式、CI方言、弱证明路径和面向人类的审查习惯纠缠在一起。我们提出智能体优先的正则代码,一种证明携带的基底,将常规产品软件重写为正则行为配置文件、类型化变更代数、证明通道、受限编辑语法、语义补丁单元、运行时负记忆和证明携带的变更对象。核心假设是,在声明预言下通过行为等价商化软件,可以将等价编码折叠为带有显式证据和证明义务的受控代表。最终目标是在公共预言下,每个经过验证的正确变更的摊销成本,包括源代码、上下文、推理、工具、验证、安全性、来源、审查、失败循环、缺陷和铸造成本。报告的降低幅度是假设,而非测量的前沿结果。提出的极限是无意外视界:可移除的意外减少,直到剩余的新颖性、证据、治理、风险和未来可选性占主导。对于支持的常规产品分布,这给出了一个可辩护的规划目标,即接近100倍的全成本降低,并非对所有软件的保证。在Qwen2.5-Coder-14B上的初步QLoRA实验表明,64,088条正则轨迹是可学习的,并抑制了测试的禁用语言标记,但未确立行为保持、规模经济或验证变更成本。贡献是一个以最小功能描述长度和验证变更成本为中心的可证伪程序。

英文摘要

Frontier coding models may spend substantial capacity learning not only program behavior, but also accidental entropy in human repositories. Such repositories contain valuable signals: tests, incidents, migrations, edge cases, product judgment, and operational history. These signals are entangled with framework churn, naming drift, generated-source ambiguity, dependency rituals, CI dialects, weak proof routes, and human-oriented review customs. We propose agent-first canonical code, a proof-carrying substrate that rewrites routine product software into canonical behavior profiles, typed change algebra, proof lanes, constrained edit grammars, semantic patch cells, runtime negative memory, and proof-carrying change objects. The core hypothesis is that quotienting software by behavior equivalence under a declared oracle can collapse equivalent encodings into governed representatives with explicit evidence and proof obligations. The endpoint is amortized cost per verified correct change, including source, context, reasoning, tools, verification, security, provenance, review, failed loops, defects, and foundry cost under a common oracle. Reported reduction bands are hypotheses, not measured frontier results. The proposed limit is a No-Accident Horizon: removable accident decreases until residual novelty, evidence, governance, risk, and future optionality dominate. For supported routine-product distributions, this gives a defensible planning target near 100-fold all-in cost reduction, not a guarantee for all software. Preliminary QLoRA experiments on Qwen2.5-Coder-14B show that 64,088 canonical trajectories are learnable and suppress tested forbidden-language markers, but do not establish behavior preservation, scaling economics, or verified-change cost. The contribution is a falsifiable program centered on minimum functional description length and verified-change cost.

2606.14356 2026-06-15 cs.DC cs.AI 新提交

PLAIground: SLO-Driven Runtime Model Selection for Compound AI Systems in the Edge-Cloud-Space Continuum

PLAIground: 边缘-云-空间连续体中复合AI系统的SLO驱动运行时模型选择

Milos Gravara, Cynthia Marcelino, Andrija Stanisic, Stefan Nastic

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出PLAIground框架,通过CAIM抽象和Pixie算法实现复合AI系统的运行时模型选择,在满足准确率、延迟和成本SLO的同时,动态切换模型,实验显示准确率达91.3%且SLO合规。

详情
AI中文摘要

3D计算连续体(统一边缘、云和空间)中的应用需要将多个AI任务(如目标检测、时间序列分析和自然语言处理)组合成复合AI系统。这些系统必须满足严格的准确率、延迟和成本服务级别目标(SLO)。维持复合AI系统SLO合规的关键机制是运行时模型选择,即为每个工作流任务动态切换AI模型。然而,现有的分布式和复合AI框架并不原生支持运行时模型选择。我们提出了PLAIground,一个支持复合AI系统运行时模型选择的框架。PLAIground引入了可复合AI模型(CAIM)抽象,通过任务和数据契约将任务语义与AI模型实现解耦,使得无需更改工作流即可切换模型。此外,PLAIground引入了Pixie,一种SLO驱动的运行时模型选择算法,在执行期间动态为每个任务选择最合适的模型。我们在两个真实的复合AI工作流上的评估表明,Pixie在保持SLO合规的同时实现了高达91.3%的准确率,而固定模型策略要么违反成本和延迟预算高达21倍,要么准确率目标偏差4%。

英文摘要

Applications in the 3D Computing Continuum, which unifies edge, cloud, and space, require combining multiple AI tasks such as object detection, time-series analytics, and natural language processing into Compound AI systems. These systems must satisfy stringent Service Level Objectives (SLOs) on accuracy, latency, and cost. A key mechanism for maintaining SLO compliance of Compound AI systems is runtime model selection, where AI models are dynamically switched for each workflow task. However, existing distributed and compound AI frameworks do not natively support runtime model selection. We present PLAIground, a framework that enables runtime model selection for Compound AI systems. PLAIground introduces Compoundable AI Model (CAIM) abstraction, which decouples task semantics from AI model implementations via Task and Data Contracts, enabling model switching without workflow changes. Additionally, PLAIground introduces Pixie, an SLO-driven runtime model selection algorithm, which dynamically selects the most suitable model for each task during execution. Our evaluation on two realistic Compound AI workflows demonstrates that Pixie achieves up to 91.3% accuracy while maintaining SLO compliance where fixed-model strategies either violate cost and latency budgets up to 21x or miss accuracy targets by 4%.

2606.14348 2026-06-15 physics.soc-ph cond-mat.stat-mech cs.CL 新提交

Detecting Historical Turning Points in Italian Media: A Complex Systems Approach to a Diachronic News Corpus

检测意大利媒体中的历史转折点:一种针对历时新闻语料库的复杂系统方法

Dario Zarcone, Salvatore Miccichè, David Sanchez

发表机构 * Department of Physics and Chemistry, University of Palermo(帕尔米洛大学物理与化学系) Institute for Cross-Disciplinary Physics and Complex Systems (IFISC), UIB-CSIC(跨学科物理与复杂系统研究所(IFISC), UIB-CSIC)

AI总结 通过NLP和复杂系统方法分析意大利《共和报》1985-2000年的60万篇文章,无监督检测出意大利第一共和国向第二共和国过渡、海湾战争等关键历史转折点。

Comments 16 pages, 9 figures, 1 table

详情
AI中文摘要

大规模文本语料库的日益可用性为使用自然语言处理(NLP)进行数据驱动的、定量的历史分析开辟了新的可能性。然而,来自前数字时代的具有历史意义的历时语料库仍然稀缺且往往不完整。我们提出了一种基于重建和探索历时语料库的定量历史分析方法,该语料库包含来自意大利报纸《共和报》的约60万篇文章,涵盖了从1985年1月1日至2000年12月31日期间发表的所有文章——这是意大利和全球政治、社会和地缘政治发生重大变革的时期。使用NLP技术,我们在词汇和语义层面分析文本;然后应用复杂系统和统计物理的工具来追踪媒体话语随时间的变化。这使我们能够检测到关键的过渡时期,例如意大利从第一共和国向第二共和国的过渡,或主要的国际冲突如海湾战争或科索沃战争,而无需依赖先验标注。结果表明,将计算语言学与复杂系统的思想相结合,可以为历史变化提供新的定量见解,为通过大规模文本数据研究媒体和社会的动态开辟了新途径。

英文摘要

The increasing availability of large-scale textual corpora has opened new possibilities for data-driven, quantitative approaches to historical analysis using Natural Language Processing (NLP). However, diachronic corpora with historical relevance from the pre-digital era remain scarce and often incomplete. We present a quantitative approach to historical analysis based on the reconstruction and exploration of a diachronic corpus of around 600,000 articles from the Italian newspaper "La Repubblica", covering all the articles published from the 1st of January 1985 to the 31st of December 2000 - a period of major political, social, and geopolitical change in Italy and globally. Using NLP techniques, we analyze the text at both lexical and semantic levels; we then apply tools from complex systems and statistical physics to trace shifts in media discourse over time. This allows us to detect key transition periods, such as the transition from the First Republic to the Second Republic in Italy, or major international conflicts like the Gulf War or the Kosovo War, without relying on prior labeling. The results show how combining computational linguistics with ideas from complex systems can offer new quantitative insight into historical changes, opening up new paths for studying the dynamics of media and society through large-scale textual data.

2606.14335 2026-06-15 math.ST cs.IT cs.LG math.IT stat.TH 新提交

Recovery thresholds for hidden weighted sparse graphs

隐藏加权稀疏图的恢复阈值

Zhe Hou, Jingcheng Liu

发表机构 * State Key Laboratory for Novel Software Technology(新型软件技术国家重点实验室)

AI总结 研究从带噪加权完全图中恢复隐藏图的阈值,基于Rényi散度与Erdős-Rényi随机图的第一矩阈值建立统一刻画,并扩展到部分恢复和全有或全无现象。

Comments 34 pages, 4 figures

详情
AI中文摘要

从含噪高维数据中恢复结构信息是统计推断的基本任务。我们研究隐藏在随机加权完全图中的图的恢复阈值。具体地,未知图 $H^* \in H_n$ 均匀随机选取,并隐藏在 $n$ 个顶点的完全图中:边 $e \in H$ 的权重独立地服从分布 $P_n$;否则权重独立地服从分布 $Q_n$。目标是从这些边权重中恢复几乎全部的 $H$。假设分布 $P_n$ 和 $Q_n$ 之间的Rényi散度满足局部Lipschitz条件,且图族 $H_n$ 满足温和的密度条件,我们给出了恢复几乎全部 $H$(也称为几乎精确恢复)的信息论极限的统一刻画。该刻画将 $P_n$ 和 $Q_n$ 之间的KL散度与 $H$ 在Erdős-Rényi随机图模型 $G(n,p)$ 中的第一矩阈值的对数联系起来。我们的下界也扩展到部分恢复任务,其中只需恢复 $H$ 的常数 $\lambda$ 比例。最后但同样重要的是,对于某些伯努利和指数分布以及高斯分布,我们能够在指数尺度上展示全有或全无(AoN)阈值现象。

英文摘要

Recovering structural information from noisy high-dimensional data is a fundamental task in statistical inference. We investigate the recovery thresholds for a graph hidden in a randomly weighted complete graph. Specifically, an unknown graph $H^* \in H_n$ is chosen uniformly at random, and hidden in a complete graph of $n$ vertices as follows: the weight of an edge $e \in H$ is distributed independently according to $P_n$; otherwise the weight is distributed independently according to $Q_n$. The goal is to recover almost all of $H$ from these edge weights. Assuming a local Lipschitzness of the Rényi divergence between distributions $P_n$ and $Q_n$, and a mild density condition for the graphs $H_n$, we give a unified characterization of the information-theoretic limit for recovering almost all of $H$ (also known as almost exact recovery). Our characterization connects the KL divergence between $P_n$ and $Q_n$ to the logarithm of the first moment threshold of $H$ in the Erdős-Rényi random graph model $G(n,p)$. Our lower bound also extends to the task of partial recovery, in which only a constant $λ$-fraction of $H$ needs to be recovered. Last but not least, for certain Bernoulli and Exponential regimes, and for Gaussian distributions, we are able to show an All-or-Nothing (AoN) threshold phenomenon at the exponential scale.

2606.14327 2026-06-15 cs.SE cs.AI cs.ET 新提交

I'm Sorry Driver, I'm Afraid I Can't Do That: Appraising the Safety of LLMs within Automotive Contexts

抱歉,司机,恐怕我不能这么做:评估LLMs在汽车环境中的安全性

Shaun Feakins, Ibrahim Habli, Kim Littler, Robert Palin

发表机构 * UKRI AI Centre for Doctoral Training in Safe Artificial Intelligence Systems (SAINTS)(英国研究理事会安全人工智能系统博士培训中心(SAINTS)) University of York(约克大学) Jaguar Land Rover(捷克·陆罗恩)

AI总结 本文从安全保证角度评估了将LLMs集成到汽车控制任务中的现有框架,指出其面临概念和具体挑战,并通过案例研究提出未来保障机制。

Comments Accepted at the Dependable AI in Embedded Systems (DAIES) Workshop at SAFECOMP 2026; 15 pages, 3 figures, 2 tables

详情
AI中文摘要

本文从安全保证的角度评估了AI开发中最近将LLMs集成到汽车环境控制任务中的框架。这项工作建立在LLMs在汽车环境中的快速集成之上。然而,我们发现目前这些框架面临重大挑战,限制了它们在实时安全关键环境中的有效性。首先,我们考虑了概念性挑战,包括部署者面临双重挑战:他们必须保证在上游(即由大型AI实验室作为通用工具开发)的模型在下游(即集成到特定车辆架构中)的可靠性。其次,我们考虑了现有标准中的具体挑战。我们表明,目前存在ISO21448中涵盖的基本工程约束(如延迟)和ISO/PAS8800中涵盖的新颖LLM特定问题(如对齐相关问题)。我们通过一个具体的介绍性实验案例研究(探索现有开源存储库Talk2Drive)来实例化这两个例子。我们提出一个安全论证,以明确现有解决方案的局限性。尽管如此,鉴于在技术层面和操作化层面正在探索LLMs在汽车环境中的使用,我们提出了针对LLM相关危险事件的潜在保证机制。

英文摘要

This paper appraises recent frameworks within AI development to integrate LLMs into control tasks in automotive contexts from the perspective of safety assurance. This work has built upon the rapid integration of LLMs across automotive settings. However, we find that at present, these frameworks face significant challenges, limiting their efficacy in real-time safety-critical contexts. Firstly, we consider conceptual challenges, including the fact that deployers are faced with a dual challenge, wherein they must assure a model which has been developed upstream, i.e. as general-purpose tools by the large AI labs, in a downstream context, i.e. into specific vehicle architectures. Secondly, we consider concrete challenges from across existing standards. We show that there are currently both fundamental engineering constraints covered in ISO21448, such as latency, and novel LLM-specific issues, such as alignment-related issues covered in ISO/PAS8800. We ground both examples in a concrete introductory, experimental case study exploring an existing open-source repository, Talk2Drive. We present a safety argument in order to make explicit the limitations of existing solutions. Nonetheless, given that the use of LLMs in automotive contexts is being explored at a technical level and operationalised, we propose potential assurance mechanisms for LLM-related hazardous events going forward.

2606.14313 2026-06-15 stat.ML cs.LG 新提交

Nonlocal Bayesian Modeling of Continuous Spatio-Temporal Dynamics

连续时空动力学的非局部贝叶斯建模

Jaeyeong Lee, Heeyoung Kim

发表机构 * Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology (KAIST)(工业与系统工程系,韩国科学技术院)

AI总结 提出NLBST模型,通过坐标基展开和连续时间ODE结合非局部积分微分方程,实现不规则观测下的连续时空预测与不确定性量化。

Comments Accepted at UAI 2026

详情
AI中文摘要

现实世界的时空预测必须处理不规则时间点、空间稀疏观测以及不确定性量化的需求。这种设置通常因非局部相互作用(长程空间耦合)而进一步复杂化。对连续空间、连续时间的非局部动力学进行建模自然会导致无限维积分微分方程(IDE),使得原则性的贝叶斯推断变得棘手。我们提出了非局部贝叶斯时空模型(NLBST),这是一个用于连续时空场的分层贝叶斯框架,它在保留可处理推断的同时学习显式的非局部耦合。NLBST通过基于坐标的空间基展开表示潜在场,并用连续时间ODE对系数过程进行建模,其可学习的线性算子对应于非局部IDE的伽辽金约化;神经ODE残差捕获额外的非线性动力学。线性高斯观测模型使得在缺失和不规则观测下能够进行卡尔曼式顺序更新,而空间基表示则使得无需重新训练即可在未测量位置进行归纳预测。全局参数通过变分推断学习,不确定性通过贝叶斯层次结构处理。在合成和真实数据集上的实验表明,该模型具有强大的预测能力和空间泛化能力,且不确定性校准良好,在强非局部和部分观测场景下相比基线方法取得了显著提升。

英文摘要

Real-world spatio-temporal forecasting must handle irregular time points, spatially sparse observations, and the need for uncertainty quantification. This setting is often further compounded by nonlocal interactions (long-range spatial coupling). Modeling continuous-space, continuous-time nonlocal dynamics naturally leads to infinite-dimensional integro-differential equations (IDEs), making principled Bayesian inference intractable. We propose the NonLocal Bayesian Spatio-Temporal model (NLBST), a hierarchical Bayesian framework for continuous spatio-temporal fields that learns explicit nonlocal coupling while retaining tractable inference. NLBST represents the latent field via a coordinate-based spatial basis expansion and models the coefficient process with a continuous-time ODE whose learnable linear operator corresponds to a Galerkin reduction of a nonlocal IDE; a Neural ODE residual captures additional nonlinear dynamics. A linear-Gaussian observation model enables Kalman-style sequential updates under missing and irregular observations, while the spatial basis representation enables inductive prediction at unmeasured locations without retraining. Global parameters are learned via variational inference, and uncertainty is handled through a Bayesian hierarchy. Experiments on synthetic and real-world datasets demonstrate strong forecasting and spatial generalization with well-calibrated uncertainty, yielding substantial gains over baselines in strongly nonlocal and partially observed regimes.

2606.14309 2026-06-15 cs.DB cs.AI cs.LO 新提交

Transforming Shape Schemas with Composable Property-Graph Queries (Extended Version)

用可组合属性图查询转换形状模式(扩展版)

Philipp Seifer, Daniel Hernández, Ralf Lämmel, Steffen Staab

发表机构 * The Software Languages Team(软件语言团队) University of Koblenz(科伦茨大学) Institute for Artificial Intelligence(人工智能研究所) University of Stuttgart(斯图加特大学) University of Southampton(南安普顿大学)

AI总结 研究在给定输入模式(ProGS)和查询(G-CORE)时推断输出模式的问题,通过映射到RDF、SHACL和SPARQL CONSTRUCT利用描述逻辑推理器实现模式约束的自动推断。

详情
AI中文摘要

属性图可能受模式约束,这些模式向查询引擎和人类用户告知有效数据的形状,强制执行数据提供者和消费者之间的契约。可组合属性图查询将输入图转换为输出图。那么,问题就出现了:在一个(或几个)转换步骤之后,可以预期哪种模式。我们研究了在给定输入模式和转换查询的情况下如何推断模式约束。具体来说,我们提出了一种推理过程,给定ProGS中的输入模式和G-CORE中的查询,推断输出模式。由于图更新会频繁发生,我们的推理过程不依赖于图实例,因此计算出的输出模式适用于所有源自符合输入模式的任何输入图的图。相关工作已经针对SPARQL CONSTRUCT查询解决了这个问题,将其编码在描述逻辑(DL)中,使得输出模式由从输入模式和查询推断出的公理蕴含。然而,属性图及其查询使问题复杂化,因为属性图具有标签和属性注释以及一等边。因此,必须以某种方式使用具体化,尽管可用的DL缺乏直接编码这些特征的手段。我们通过一系列映射来应对这一新挑战:i) 在RDF中具体化的属性图,与ii) 从ProGS到SHACL的映射以及iii) 从G-CORE到SPARQL CONSTRUCT查询的映射对齐。通过这种方式,属性图的模式推断变得可管理,因为我们通过额外的映射层分解问题并利用高效的DL推理器。我们发展了关于推断模式约束的可靠性和映射模式及查询的语义等价性的元理论。

英文摘要

Property graphs may be constrained by schemas that inform both query engines and human users about the shape of valid data, enforcing a contract between data provider and consumer. Composable property-graph queries transform input graphs into output graphs. Then, the question arises of which schema can be expected after one (or several) transformation steps. We investigate how schema constraints can be inferred given an input schema and a transforming query. Specifically, we propose a reasoning procedure that, given an input schema in ProGS and a query in G-CORE infers an output schema. Since graph updates will happen frequently, our inference procedure does not rely on graph instances, such that the computed output schema applies to all graphs originating from any input graph complying with the input schema. Related work has addressed this problem for SPARQL CONSTRUCT queries, encoding it in Description Logics (DLs) so that the output schema is entailed by axioms inferred from input schema and queries. Property graphs and their queries, however, complicate the matter, as property graphs feature label and property annotations as well as first-class edges. Thus, reification has to be used in one way or another, though available DLs lack the means to encode such features directly. We approach this novel challenge via a family of mappings for i) property graphs reified in RDF, aligned with ii) a mapping from ProGS to SHACL and iii) a mapping from G-CORE to SPARQL CONSTRUCT queries. In this manner, schema inference for property graphs becomes manageable, as we break apart the problem through the extra mapping layer and utilize efficient DL reasoners. We develop the metatheory regarding the soundness of inferred schema constraints and the semantic equivalence of mapped schemas and queries.

2606.14306 2026-06-15 cs.HC cs.AI 新提交

Thinking Outside the [Chat]Box: Bridging Computer Science and Industrial Design for Cognitive-Inclusive Generative AI

跳出聊天框:融合计算机科学与工业设计的认知包容性生成式人工智能

Virginia Francisco, Daniel Guasch, Raquel Hervás

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对当前GenAI界面认知门槛高、对智力障碍者不友好等问题,通过跨学科设计挑战,提出融合计算机科学的结构化支架与工业设计的体验式支架的双层框架,以扩展认知包容性交互设计空间。

详情
AI中文摘要

当前的生成式人工智能(GenAI)界面仍然主要局限于聊天框交互,这给用户带来了高认知负担,并对智力障碍(ID)人群造成了重大障碍,包括提示词表述困难、响应过载以及评估信息可靠性的机制有限。为了探索认知无障碍的替代交互模型,我们进行了一项跨学科协同设计挑战,其中两个学生群体(计算机科学和工业设计)从相同的功能需求集(例如,提示词支架、结构化输出、基于GUI的细化、透明度和个性化)出发,开发界面概念。比较最终提案揭示了在基础需求上的趋同(特别是初始校准、主动提示和响应片段的直接操作)以及互补性贡献,勾勒出一个多层次支持系统。计算机科学团队主要产生结构支架,强调通过可靠性指标、明确来源和长对话上下文管理等机制实现可预测性、可导航性和信任。工业设计团队强调体验支架,侧重于节奏、注意力引导、多模态和主动代理,包括逐步响应流程、专注模式和类似助手的集成。我们将这些发现综合为一个双层支架框架,该框架将认知无障碍GenAI交互的设计空间扩展到以聊天为中心的模式之外,并激励未来在专家细化、技术可行性和与ID用户进行实证验证方面的工作。

英文摘要

Current Generative AI (GenAI) interfaces remain largely constrained to chatbox interaction, which can impose high cognitive demands on users and create substantial barriers for people with intellectual disabilities (ID), including prompt formulation difficulties, response overload, and limited mechanisms to assess information reliability. To explore alternative interaction models for cognitive accessibility, we conducted a cross-disciplinary co-design challenge in which two student cohorts (Computer Science and Industrial Design) developed interface concepts from the same set of functional requirements (e.g., prompt scaffolding, structured output, GUI-based refinement, transparency, and personalization). Comparing the resulting proposals reveals both convergence on foundational requirements (notably initial calibration, proactive prompting, and direct manipulation of response fragments) and complementary contributions that outline a multi-layered support system. Computer Science teams primarily produced structural scaffolding, emphasizing predictability, navigability, and trust through mechanisms such as reliability indicators, explicit sources, and context management for long conversations. Industrial Design teams emphasized experiential scaffolding, focusing on pacing, attention guidance, multimodality, and proactive agency, including step-by-step response flows, focus modes, and assistant-like integrations. We synthesize these findings into a dual-layer scaffolding framework that expands the design space for cognitively accessible GenAI interaction beyond chat-centric models and motivates future work on expert refinement, technical feasibility, and empirical validation with users with ID.

2606.14289 2026-06-15 math.OC cs.LG cs.NA cs.NE math.NA stat.ML 新提交

Operator Calculus for Population-Based Optimization: A Mean-Field Convergence Theory

基于群体的优化的算子演算:平均场收敛理论

Pekka Malo, Lauri Viitasaari, Patrik Nummi, Antti Suominen, Ankur Sinha, Olli Tahvonen

发表机构 * Aalto University(阿尔托大学)

AI总结 提出一种算子演算,将多种基于群体的优化方法统一为三个基本算子(变异、选择、重组)的复合,并建立模块化Lyapunov原理,证明在稳定性和正则性条件下指数收敛。

Comments 71 pages, 4 figures, 2 tables; ancillary files contain Python code reproducing the numerical experiments

详情
AI中文摘要

基于群体的和分布优化方法,从进化策略和基于共识的优化到协方差矩阵适应和视为分布动力学的随机梯度方法,被广泛用于非凸或黑箱问题,但它们的收敛分析仍然分散在特定算法的技术中。我们引入一种算子演算,其中一大类这样的方法,在选择适当的状态空间并在必要时通过记忆或策略变量增强状态后,被描述为作用于概率测度的三个基本算子(变异、选择、重组)的复合。在明确的稳定性和正则性条件下,复合算子允许一个预生成子,其连续时间极限是一个保持算子分裂的输运-反应-跳跃(TRJ)偏微分方程。在此基础之上,我们建立了一个模块化的Lyapunov原理。如果状态空间Lyapunov函数既在完整生成子下耗散,又控制相关的搜索空间度量,那么状态空间Lyapunov泛函和诱导的搜索误差指数衰减。加性生成子结构允许逐个算子地组装耗散估计,为验证复合平均场算法的收敛性提供了一个工具箱。

英文摘要

Population-based and distributional optimization methods, from evolution strategies and consensus-based optimization to covariance-matrix adaptation and stochastic gradient methods viewed as distributional dynamics, are widely used for nonconvex or black-box problems, yet their convergence analyses remain fragmented across algorithm-specific techniques. We introduce an operator calculus in which a broad class of such methods, after choosing an appropriate state space and, where necessary, augmenting the state by memory or strategy variables, is described as a composition of three elementary operators (mutation, selection, and recombination) acting on probability measures. Under explicit stability and regularity conditions, the composite operator admits a pre-generator whose continuous-time limit is a transport-reaction-jump (TRJ) PDE that preserves the operator splitting. On this foundation we establish a modular Lyapunov principle. If a state-space Lyapunov function both dissipates under the full generator and controls the relevant search-space gauges, then the state-space Lyapunov functional and the induced search errors decay exponentially. The additive generator structure allows dissipation estimates to be assembled operator by operator, providing a toolkit for certifying convergence of composite mean-field algorithms.

2606.14269 2026-06-15 cs.IR cs.CL 新提交

ScoreGate: Adaptive Chunk Selection for Retrieval-Augmented Generation via Dual-Score Statistical Fusion

ScoreGate: 通过双分数统计融合实现检索增强生成的自适应分块选择

Karamvir Singh, Arvind Jain

发表机构 * HighLevel, Inc.(HighLevel公司)

AI总结 针对固定K检索导致过检索或欠检索的问题,提出ScoreGate,利用双编码器相似度和交叉编码器重排序分数进行自适应检索数量控制,无需额外模型调用,在MS MARCO上以更少块数提升MRR@10,并实现零假阳性。

Comments 20 pages, 6 figures, 14 tables

详情
AI中文摘要

固定基数检索无论查询复杂度如何,都向生成器注入恒定数量的top-K块,导致窄查询的过检索和组合查询的欠检索。我们描述了ScoreGate,一种轻量级的分数空间决策机制,利用标准流程中已经产生的两个分数——双编码器相似度s_i和交叉编码器重排序分数r_i,在推理时控制检索基数,无需额外的模型推理调用。其核心见解是,交叉编码器的确认可以挽救因词汇不匹配而被双编码器检索排名较低的语义相关块——这是固定K或单分数阈值化未能解决的一种失败模式。在MS MARCO(200个开发查询)上,ScoreGate以比标准Top-K少35%的保留块实现了MRR@10 = 0.401。在一个内部基准测试(n=300,Fleiss' kappa=0.87)上,ScoreGate在97.77-99.34%的召回率下观察到零假阳性(95% CI [96.4%, 100%]),每个查询的令牌数减少34.8%,仅增加31ms延迟。在MS MARCO和实际生产流量上的结果表明,自适应检索基数可以在不降低检索质量的情况下提高检索效率。

英文摘要

Fixed-cardinality retrieval injects a constant top-K chunks into the generator regardless of query complexity, causing over-retrieval for narrow queries and under-retrieval for compositional ones. We describe ScoreGate, a lightweight score-space decision mechanism that controls retrieval cardinality at inference time using two scores already produced by the standard pipeline: bi-encoder similarity s_i and cross-encoder reranker score r_i, with no additional model inference calls required. Its core insight is that cross-encoder affirmation can rescue semantically relevant chunks that bi-encoder retrieval ranks poorly due to vocabulary mismatch -- a failure mode unaddressed by fixed-K or single-score thresholding. On MS MARCO (200 dev queries), ScoreGate achieves MRR@10 = 0.401 with 35% fewer retained chunks than Standard Top-K. On an internal benchmark (n=300, Fleiss' kappa=0.87), ScoreGate observed zero false positives (95% CI [96.4%, 100%]) at 97.77-99.34% recall, with 34.8% fewer tokens per query and only 31ms added latency. Results on both MS MARCO and real-world production traffic suggest that adaptive retrieval cardinality can improve retrieval efficiency without degrading retrieval quality.

2606.14268 2026-06-15 stat.ML cs.LG 新提交

Gradient boosting for extremes: sampling theory and application to insurance

极值的梯度提升:抽样理论及其在保险中的应用

Stéphane Lhaut, Olivier Lopez

发表机构 * CREST, CNRS, Ecole polytechnique, Groupe ENSAE-ENSAI, ENSAE Paris, Institut Polytechnique de Paris, Palaiseau, France(CREST、国家科学研究中心、巴黎高等工业学校、ENSAE-ENSAI集团、巴黎ENSAE、巴黎理工学院、Palaiseau法国)

AI总结 提出梯度提升估计广义帕累托分布的理论,通过正交重参数化改进收敛性,并在保险数据中验证了方法有效性。

Comments 36 pages, 10 figures

详情
AI中文摘要

我们为梯度提升在超阈值建模中估计协变量依赖的广义帕累托(GP)分布开发了统计学习理论。在对GP似然进行正交重参数化以对角化其Fisher信息矩阵后,我们将估计问题纳入经验风险最小化(ERM)框架,并推导了提升估计器的非渐近误差界。我们的分析考虑了过程中的三个不同误差来源:统计波动、GP模型渐近性质固有的近似偏差(在二阶正则变化下控制)以及与有限次提升迭代相关的近似误差,明确了由此产生的偏差-方差权衡。通过模拟,我们展示了重参数化的实际好处,表明它在训练过程中显著降低了梯度相关性并提高了收敛稳定性。该方法应用于德克萨斯州保险部的医疗事故保险数据集,包含超过18000个已结索赔。梯度提升方法对和解成本分布的尾部拟合良好,并揭示出和解天数是对尾部重尾性起主导作用的预测因子,这与准备金文献中的早期发现一致。

英文摘要

We develop a statistical learning theory for gradient boosting applied to the estimation of covariate-dependent Generalized Pareto (GP) distributions in the context of Peaks-over-Threshold modeling. After an orthogonal reparametrization of the GP likelihood that diagonalizes its Fisher information matrix, we cast the estimation problem within the Empirical Risk Minimization (ERM) framework and derive non-asymptotic error bounds for the boosting estimator. Our analysis accounts for three distinct sources of error in the process: statistical fluctuations, the approximation bias inherent to the asymptotic nature of the GP model-controlled under second-order regular variation-and the approximation error associated with the finite number of boosting iterates, making explicit the resulting bias-variance trade-off. We illustrate the practical benefits of the reparametrization through simulations, showing that it significantly reduces gradient correlation during training and improves convergence stability. The methodology is applied to a medical malpractice insurance dataset from the Texas Department of Insurance, comprising over 18 000 closed claims. The gradient boosting approach yields a good fit for the tail of settlement cost distributions and reveals that the number of days to settlement is the dominant predictor of tail heaviness, consistent with earlier findings in the reserving literature.