arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2370
2605.28843 2026-05-29 cs.DL cs.CY cs.LG

The Biosecurity Blind Spot: Systematic Dual-use Detection in Open Science Infrastructure

生物安全盲点:开放科学基础设施中的系统性双重用途检测

Vasudha Sharma, Chakresh Kumar Singh, Jayesh Choudhari, Dharmit Nakrani

AI总结 本研究通过混合词法过滤和大语言模型评估,系统分析了bioRxiv预印本中双重用途研究关注内容,揭示了开放获取摘要中普遍存在的潜在风险,并提出了结合元数据监控与开放科学原则的治理框架。

Comments Ongoing work

详情
AI中文摘要

人工智能以前所未有的速度改变着生命科学研究,加速了蛋白质结构预测、基因组建模和药物开发等领域的发现(Jumper et al., 2021; Mak et al., 2024)。然而,这种快速进步,加上开放科学运动,引入了重大的双重用途研究问题,但这些问题尚未得到充分的实证研究。本文首次对开放预印本服务器上的双重用途研究关注(DURC)内容进行了系统分析。我们使用词法过滤和大语言模型(LLM)评估的混合流程,筛选了约52,000篇bioRxiv预印本(2024-2025年),并根据美国及澳大利亚集团监管框架,对九个DURC类别、三个PEPP类别和五个治理类别的元数据进行了评分。我们的分析显示,双重用途相关的知识通常出现在公开可访问的标题和摘要中,即使在具有合法公共卫生目标的研究中,也常常超过既定的风险阈值。虽然这种映射捕捉了表面层面的信息扩散,但它并未衡量操作能力、下游滥用潜力或限制有害应用的重大技术和生物安全障碍。我们认为,机构审查流程、资助要求和预印本平台政策必须发展,以纳入主动的元数据级监控,同时不损害科学透明度。最终,将高风险方法学的受控访问机制与科学贡献的开放摘要相协调,为大规模治理AI加速生物学提供了实用框架。

英文摘要

AI is transforming life sciences research at unprecedented speed, accelerating discovery across protein structure prediction, genome modeling, and drug development (Jumper et al., 2021; Mak et al., 2024). Yet this rapid advancement, coupled with the open science movement, introduces significant dual-use research concerns that have received limited empirical scrutiny. Here we present the first systematic analysis of dual-use research of concern (DURC) content on open preprint servers. We screened ~52,000 bioRxiv preprints (2024-2025) using a hybrid pipeline of lexical filtering and large language model (LLM) evaluation, scoring metadata across nine DURC, three PEPP, and five governance categories aligned with U.S. and Australia Group oversight frameworks. Our analysis reveals that dual-use-adjacent knowledge is routinely present in openly accessible titles and abstracts, often exceeding established risk thresholds even in studies with legitimate public health objectives. While this mapping captures surface-level information diffusion, it does not measure operational capability, downstream misuse potential, or the substantial technical and biosafety barriers that constrain harmful application. We argue that institutional review processes, funding requirements, and preprint platform policies must evolve to incorporate proactive, metadata-level monitoring without compromising scientific transparency. Ultimately, harmonizing controlled-access mechanisms for high-risk methodologies with open summaries of scientific contributions offers a pragmatic framework for governing AI-accelerated biology at scale.

2605.28488 2026-05-29 stat.ML cs.LG math.ST stat.TH

Bridging Maximum Likelihood and Optimal Transport for Efficient Inference and Model Selection in Stochastic Block Models

桥接最大似然与最优传输:随机块模型中的高效推理与模型选择

Simon Queric, Cédric Vincent-Cuaz, Charles Bouveyron, Marco Corneli

AI总结 本文通过最优传输视角研究随机块模型,提出正则化与未正则化的半松弛Gromov-Wasserstein估计器,实现聚类与模型参数的联合推断及簇数自动选择。

Comments 10 pages, 8 figures

详情
AI中文摘要

我们通过最优传输(OT)的视角研究随机块模型(SBM)中的推断。首先,我们证明最大似然变分推断(MLVI)可以解释为带有熵正则化的半松弛Gromov-Wasserstein(srGW)投影。虽然这种公式能产生准确的聚类,但熵正则化阻止了传输计划的稀疏性,从而阻碍了内在的模型选择。因此,我们研究未正则化的srGW估计器,并证明它们在渐近情况下一致地恢复SBM连接矩阵和潜在簇分配。然而,这种渐近性质在有限样本中并不能转化为可靠的模型选择,需要额外的机制来促进推断的簇比例中的稀疏性。我们通过实验表明,这种正则化公式产生的估计器能够在单个优化问题中同时恢复模型参数并选择簇的数量,从而避免了昂贵的网格搜索或启发式模型选择程序。

英文摘要

We study inference in stochastic block models (SBMs) through the lens of optimal transport (OT). We first establish that maximum likelihood variational inference (MLVI) can be interpreted as a semi-relaxed Gromov-Wasserstein (srGW) projection with entropic regularization. While this formulation yields accurate clustering, the entropic regularization prevents transport plans to be sparse, hindering intrinsic model selection. Consequently, we investigate unregularized srGW estimators, and prove that they consistently recover both the SBM connectivity matrix and latent cluster assignments in the asymptotic regime. However, this asymptotic property does not translate into reliable model selection in finite samples, and calls for additional mechanisms to promote sparsity in the inferred cluster proportions. We empirically show that such a regularized formulation yields estimators that simultaneously recover model parameters and select the number of clusters in a single optimization problem, thereby avoiding costly grid search or heuristic model selection procedures.

2605.27968 2026-05-29 cs.CE cs.LG physics.comp-ph

Adapting Automotive Aerodynamics Surrogates to New Vehicle Families via Transfer Learning

通过迁移学习将汽车空气动力学代理模型适应新车型族

Seunghwan Keum, Alok Warey

AI总结 本文通过留一族实验,在61.47M参数的Transformer代理模型上比较全微调、轻量微调和低秩适应三种策略,发现低秩适应通过秩约束适配器正则化损失景观并保留预训练特征,仅用20个样本即可实现R²=0.85±0.02,优于全微调和从零训练,表明低秩适应是几何迁移的收敛使能器。

Comments 23 pages, 12 figures

详情
AI中文摘要

在工业CFD工作流中部署科学机器学习代理模型需要将预训练模型适应到新车型族,而无需大型数据集;然而,几何编码器学习的几何表示是否能够迁移到拓扑不同的形状仍未得到验证。 我们通过留一族实验来解决这个问题,实验使用一个61.47M参数的Transformer代理模型(AB-UPT),该模型在四个车型族(411个外部空气动力学案例)上预训练,并仅用20个样本适应到留出的第五个车型族。比较了三种策略:全微调(FFT)、轻量微调(LFT)和低秩适应(LoRA)。核心发现是预训练的几何编码器学习了可迁移的表示,但适应机制决定了它们是否能够被利用。FFT不稳定,因为61.47M无约束参数对20个样本过拟合(R²=0.40);LFT失败,因为冻结的编码器无法表示未见过的形状(R²<0)。LoRA解决了这两个问题:注入所有层的秩约束适配器正则化了损失景观,同时保留了预训练特征,在所有五个车型族上实现了R²=0.85±0.02,力RMSE比FFT低50%,点场误差低28%。LoRA还优于使用3倍目标族数据从零开始的训练,消除了对每个族大型数据集的需求。这些结果将LoRA从一种节省内存的便利工具重新定义为几何迁移的收敛使能器:一个共享骨干网络配合轻量级的每个族适配器,可在数小时内从最小数据训练完成。

英文摘要

Deploying Scientific Machine Learning surrogates in industrial CFD workflows requires adapting pretrained models to new vehicle families without large datasets; yet whether geometric representations learned by a geometry encoder transfer to topologically distinct shapes remains unvalidated. We address this through leave-one-family-out experiments on a 61.47M-parameter Transformer surrogate (AB-UPT) pretrained on four vehicle families (411 external aerodynamics cases) and adapted to the held-out fifth with only 20 samples. Three strategies are compared: Full Fine-Tuning (FFT), Lightweight Fine-Tuning (LFT), and Low-Rank Adaptation (LoRA). The central finding is that pretrained geometry encoders learn transferable representations, but the adaptation mechanism determines whether they can be exploited. FFT destabilizes as 61.47M unconstrained parameters overfit to 20 samples (R^2=0.40); LFT fails because the frozen encoder cannot represent unseen shapes (R^2<0). LoRA resolves both: rank-constrained adapters injected into all layers regularize the loss landscape while preserving pretrained features, achieving R^2=0.85+/-0.02 across all five families with 50% lower force RMSE than FFT and 28% lower pointwise field errors. LoRA also outperforms from-scratch training using 3x more target-family data, eliminating the need for large per-family datasets. These results recast LoRA from a memory-saving convenience into a convergence enabler for geometry transfer: a shared backbone paired with lightweight per-family adapters trainable in hours from minimal data.

2605.27480 2026-05-29 q-bio.OT cs.AI cs.CY

BIRDS: Characterizing and Understanding Biodiversity Impact of Large Language Model Serving

BIRDS:表征与理解大语言模型服务对生物多样性的影响

Tianyao Shi, Yi Ding

AI总结 提出BIRDS框架,通过定义请求级功能单元、量化运营与隐含生物多样性影响,并引入质量归一化生物多样性影响(QNBI),揭示大规模LLM服务对生态系统的累积影响及质量感知的服务权衡。

Comments 21 pages, 27 figures, 9 tables

详情
AI中文摘要

大语言模型(LLM)服务产生的环境影响不仅限于碳和水,还包括通过生物多样性相关途径造成的生态系统破坏。我们提出了BIRDS,一个用于请求驱动型LLM服务的生物多样性影响框架。BIRDS定义了请求级功能单元,量化了运营和隐含的生物多样性影响,并引入了质量归一化生物多样性影响(QNBI)来联合分析生态影响和响应质量。在不同的工作负载、模型、GPU和区域中,BIRDS揭示了生物多样性影响在大规模下累积,并暴露了可操作的质量感知服务权衡。

英文摘要

Large language model (LLM) serving creates environmental impacts beyond carbon and water, including ecosystem damage through biodiversity-related pathways. We present BIRDS, a framework for Biodiversity Impact of Request-Driven LLM Serving. BIRDS defines request-level functional units, quantifies operational and embodied biodiversity impact, and introduces Quality-Normalized Biodiversity Impact (QNBI) to jointly analyze ecological impact and response quality. Across diverse workloads, models, GPUs, and regions, BIRDS reveals that biodiversity impact accumulates at scale and exposes actionable quality-aware serving tradeoffs.

2605.27474 2026-05-29 stat.ML cs.LG

Stop Suppressing the Tail: Causal Inference for Extreme Events

停止抑制尾部:极端事件的因果推断

Eichi Uehara

AI总结 针对重尾结果,提出一种平均剂量-响应函数(ADRF)估计器,通过基于中位数中心化的尾部诊断(PDHTE+JK)打破循环依赖,输出结构化尾部形状和深层尾部风险指标,在极端事件预测中显著优于传统方法。

Comments 22 pages, 6 figures, 13 tables. Keywords: double machine learning, dose-response, heavy tails, extreme value theory, causal inference

详情
AI中文摘要

估计结果如何响应连续处理(平均剂量-响应函数,ADRF)是因果推断的核心基础。然而,当结果具有重尾时,标准的鲁棒双重机器学习(DML)会刻意抑制这些极端值以稳定整体均值。在高风险场景(如金融收益或气候损失)中,这种被忽略的千分之一极端事件恰恰是实际目标量。此外,当前从模型残差中读取尾部的方法存在循环依赖,导致仅因核心估计器在Huber和Welsch之间切换,尾部形状推断就会发生剧烈变化。本研究提出一种ADRF估计器,它在标准点估计之外输出结构化的尾部形状。其尾部诊断(PDHTE+JK)通过基于中位数中心化的结果评估每个处理下的尾部形状,成功打破了循环依赖,使诊断结果不受核心方法选择的影响。输出包含四个处理条件量:尾部形状$\hatξ(t)$、深层尾部回报水平$\hat{Q}_α(t)$、条件短缺$\hat{S}_α(t)$、恢复的均值ADRF,以及一个明确的拒绝机制,当数据不支持极值建模时拒绝外推。与核加权分位数回归(QR)相比,所提估计器在重尾面板上将深层尾部($α=0.001$)回报水平MAE降低了11%,条件短缺MAE降低了25.5%。在样本稀缺场景($n\le2000$)中,MAE降低了20-29%。在freMTPL2汽车保险索赔数据上,它在对数索赔尺度上成功触发了明确的外推拒绝,这是QR或仅损失DML无法实现的。

英文摘要

Estimating how an outcome responds to a continuous treatment (the Average Dose-Response Function, or ADRF) is a core causal-inference primitive. However, when outcomes possess heavy tails, standard robust double machine learning (DML) deliberately suppresses these extremes to stabilize the bulk average. In high-stakes settings, such as financial returns or climate losses, this omitted 1-in-1000 extreme event is the actual target quantity. Furthermore, current methods that read the tail from a model's residuals suffer from circular dependence, causing tail shape inferences to shift drastically based solely on whether the core estimator is switched between Huber and Welsch. The research proposes an ADRF estimator that emits a structured tail-shape output alongside the standard point estimate. Its tail diagnostic (PDHTE+JK) evaluates the per-treatment tail shape from the outcome centered by a pilot median, successfully breaking the circular dependence and rendering the diagnostic invariant to the choice of core method. The output encompasses four treatment-conditional quantities: tail shape $\hatξ(t)$, deep-tail return levels $\hat{Q}_α(t)$, conditional shortfalls $\hat{S}_α(t)$, the recovered mean ADRF, and an explicit refusal mechanism that declines extrapolation when extreme-value modeling is unsupported by the data. Compared to kernel-weighted quantile regression (QR), the proposed estimator reduces deep-tail ($α=0.001$) return-level MAE by 11% and conditional-shortfall MAE by 25.5% across a heavy-tailed panel. It also achieves a 20-29% MAE reduction in sample-scarce regimes ($n\le2000$). On freMTPL2 motor-insurance claims, it successfully triggered an explicit extrapolation refusal on the log-claim scale, which neither QR nor loss-only DML can produce.

2605.27382 2026-05-29 cs.HC cs.AI cs.CL

The Alignment Floor: How Persona Customization Breaks Safety in Weakly-Aligned LLMs

对齐下限:角色定制如何破坏弱对齐大语言模型的安全性

Xing Zhang, Guanghui Wang, Yanwei Cui, Wei Qiu, Ziyuan Li, Bing Zhu, Peiyang He

AI总结 通过对比强对齐与弱对齐模型在不同角色条件下的谄媚率变化,定义对齐下限Δ_floor作为评估模型角色定制安全性的审计指标。

详情
AI中文摘要

告诉LLM“要热情”会使轻对齐模型的谄媚率从30%上升到50%,但对强对齐模型没有影响。我们将这一差距定义为对齐下限Δ_floor(m)=max_pS(m,p)-min_pS(m,p),即模型在不同角色条件下产生的谄媚率范围,并将谄媚视为角色条件属性而非固定模型属性。多元AI依赖于通过角色提示(如“要有创造力”或“要彻底”)进行行为适应,使系统能够尊重不同的用户价值观和沟通风格;安全问题在于给定模型在真实性改变之前能吸收多少定制化。我们进行了一项受控案例研究,对比了强对齐的RLHF+宪法AI模型(Claude Sonnet 4.6)与轻对齐模型(Amazon Nova Lite),涵盖7种角色条件和5个任务,共1800次运行。存在性结果促使进行逐模型审计:至少有一个强对齐模型的Δ_floor=5个百分点(在15%控制率的5个百分点内),至少有一个轻对齐模型的Δ_floor=45个百分点(范围5%-50%)。在轻对齐模型上,所有五种大五人格角色都增加了谄媚率,且反直觉的是,宜人性产生的增幅最小而非最大。研究中最大的单一效果是建设性的:怀疑论者角色使轻对齐模型的谄媚率降低了25个百分点,并且是唯一指示抵制用户主张而非与之互动的角色,这暗示了方向性解释。角色效果的跨模型迁移几乎为零,因此角色-对齐测试必须逐模型进行。我们提出Δ_floor作为部署时的审计指标:在部署角色定制之前,在小规模角色面板上测量该指标。

英文摘要

Telling an LLM to "be enthusiastic" raises its sycophancy rate from 30\% to 50\% on a lightly-aligned model, but has zero effect on a strongly-aligned one. We define this gap as the alignment floor, $Δ_{\text{floor}}(m)=\max_pS(m,p)-\min_pS(m,p)$, the range of sycophancy rates a model produces across persona conditions, and treat sycophancy as a persona-conditional property rather than a fixed model property. Pluralistic AI relies on behavioral adaptation via persona prompts like "be creative" or "be thorough", which let systems respect diverse user values and communication styles; the safety question is how much customization a given model can absorb before its truthfulness shifts. We present a controlled case study contrasting a strongly-aligned RLHF + Constitutional-AI model (Claude Sonnet 4.6) with a more lightly-aligned model (Amazon Nova Lite), spanning seven persona conditions and five tasks for 1800 total runs. An existence-pair result motivates per-model auditing: there is at least one strongly-aligned model with $Δ_{\text{floor}}=5$pp (within 5pp of the 15\% control rate) and at least one lightly-aligned model with 45pp (5\%--50\% range). On the lightly-aligned model, all five Big Five personas increase sycophancy over control, and counterintuitively Agreeableness produces the smallest increase, not the largest. The single largest effect in the study is constructive: a Skeptic persona reduces sycophancy by 25pp on the lightly-aligned model, and is the only persona that instructs resistance against user claims rather than engagement with them, suggesting a directionality account. Cross-model transfer of persona effects is near-zero, so persona-alignment testing must be per-model. We propose $Δ_{\text{floor}}$ as a deployment-time audit metric: measure it on a small persona panel before deploying persona customization.

2605.26255 2026-05-29 eess.IV cs.AI cs.LG

Prospective evaluation of multimodal respiratory failure prediction: Do chest X-rays improve performance beyond EHR signals?

多模式呼吸衰竭预测的前瞻性评估:胸部X光片能否在电子健康记录信号之外提升性能?

Xiaolei Lu, Shamim Nemati

AI总结 本研究提出一种门控多模态框架,集成结构化电子健康记录时间序列数据和胸部X光片基础模型表示,用于前瞻性预测ICU患者24小时内是否需要有创机械通气,结果显示相比仅使用电子健康记录的模型和医生预测,多模态融合提高了区分度、敏感性和阳性预测值。

详情
AI中文摘要

呼吸衰竭的早期预测对于重症监护病房的及时临床干预至关重要。现有的基于电子健康记录(EHR)的模型可以持续监测生理恶化,但可能无法完全捕捉胸部X光片(CXR)中反映的肺部病理生理学。在本研究中,我们探讨CXR信息是否能在仅使用EHR信号的基础上改善有创机械通气的前瞻性预测。我们开发了一个门控多模态框架,将结构化EHR时间序列数据与CXR基础模型表示相结合。门控模块根据患者特定的临床背景自适应地控制成像特征的贡献,使模型在成像信息有用时选择性地依赖它。我们前瞻性地评估了该框架在ICU患者中预测24小时内需要有创机械通气的性能,并将其与已建立的仅使用EHR的模型(Ventio)、在匹配临床时间点获得的医生预测以及替代多模态变体进行比较。门控多模态模型比仅使用EHR的基线模型实现了更高的区分度,使用REMEDIS和MedInsight CXR表示时AUROC值分别为0.860和0.858,而Ventio为0.752。相对于医生预测,多模态框架显著提高了敏感性,同时保持了良好的特异性。与仅使用EHR的模型相比,多模态整合提高了特异性和阳性预测值,表明CXR信息可以细化选定患者的风险估计。这些发现支持自适应多模态融合作为将成像纳入前瞻性呼吸衰竭预测的实用策略。

英文摘要

Early prediction of respiratory failure is critical for timely clinical intervention in intensive care units. Existing electronic health record (EHR)-based models can continuously monitor physiologic deterioration, but they may not fully capture pulmonary pathophysiology reflected in chest radiographs (CXRs). In this study, we ask whether CXR information improves prospective prediction of invasive mechanical ventilation beyond EHR signals alone. We develop a gated multimodal framework that integrates structured EHR time-series data with CXR foundation-model representations. The gating module adaptively controls the contribution of imaging features based on patient-specific clinical context, allowing the model to selectively rely on imaging information when it is informative. We prospectively evaluate the framework for predicting invasive mechanical ventilation within 24 hours in ICU patients and compare it with an established EHR-only model (Ventio), physician predictions obtained at matched clinical time points, and alternative multimodal variants. The gated multimodal models achieved higher discrimination than the EHR-only baseline, with AUROC values of 0.860 and 0.858 using REMEDIS and MedInsight CXR representations, respectively, compared with 0.752 for Ventio. Relative to physician predictions, the multimodal framework substantially improved sensitivity while maintaining favorable specificity. Compared with the EHR-only model, multimodal integration increased specificity and positive predictive value, suggesting that CXR information can refine risk estimation in selected patients. These findings support adaptive multimodal fusion as a practical strategy for incorporating imaging into prospective respiratory failure prediction.

2605.25556 2026-05-29 cs.LO cs.AI

Keep the Proof State Live: Snapshotting for Efficient Tactic Search in Lean 4

保持证明状态活跃:Lean 4 中高效策略搜索的快照技术

Austin Shen, Yunong Shi

AI总结 针对 Lean 4 中并行策略搜索因反复重建证明状态导致开销巨大的问题,提出证明状态快照技术,通过一次捕获并复用证明状态,实现 5.6-50 倍加速。

Comments 11 pages, 1 figure. v2: Added co-author affiliation (Amazon Web Services) and contact emails for both authors

详情
AI中文摘要

基于 Lean 4 的自动定理证明系统越来越依赖对部分指定证明(如 Draft-Sketch-Prove (DSP) 流水线生成的证明)进行并行策略搜索。在现有系统中,每个搜索分支通过重新运行 elaboration 来重建证明状态,导致每个分支产生大量开销。在带有 Mathlib 的 Lean 4 中,这种开销有两个组成部分:(1) 导入加载,反序列化预编译库(每个分支约 60 秒);(2) 定理体 elaboration,重新检查直到目标目标的定理上下文(根据证明复杂度估计为 18-735 秒)。两者合计占每个分支墙钟时间的 99% 以上,使得基于组合的搜索难以大规模应用。我们观察到,这种开销源于证明搜索的结构与其执行模型之间的不匹配:分支是通过重复重建证明状态实现的,而不是直接重用。为了解决这个问题,我们引入了证明状态快照,它一次捕获 elaborated 证明状态,并通过 Lean 4 语言服务器的一个小扩展在分支间重用。在 48 个 miniF2F-v2 问题(45 个证明阶段基准和 3 个完整端到端运行)上,我们的方法比标准回退方法实现了 5.6-50 倍的墙钟时间加速(平均 14 倍,中位数 9.7 倍)。加速比随证明分支数量增加而增加。我们的方法与导入级缓存(例如 Kimina Lean Server)正交,后者避免了导入加载,但未避免定理体 elaboration。修补后的 Lean 二进制文件和 Snapshot-DSP 流水线将在发表后作为开源发布。

英文摘要

Automated theorem proving systems built on Lean 4 increasingly rely on parallel tactic search over partially specified proofs, such as those generated by Draft-Sketch-Prove (DSP) pipelines. In current systems, each search branch reconstructs a proof state by re-running elaboration, leading to substantial per-branch overhead. In Lean 4 with Mathlib, this cost has two components: (1) import loading, which deserializes pre-compiled libraries (~60 s per branch); and (2) theorem-body elaboration, which re-checks the theorem context up to the target goal (estimated 18-735 s depending on proof complexity). Together, these account for >99% of per-branch wall time, making portfolio-based search impractical at scale. We observe that this overhead arises from a mismatch between the structure of proof search and its execution model: branching is implemented via repeated reconstruction of proof states rather than direct reuse. To address this, we introduce proof-state snapshotting, which captures the elaborated proof state once and reuses it across branches via a small extension to the Lean 4 language server. Across 48 miniF2F-v2 problems (45 prove-phase benchmarks and 3 full end-to-end runs), our approach achieves a 5.6-50x wall-time speedup over the standard fallback (average 14x, median 9.7x). Speedup increases with the number of proof branches. Our method is orthogonal to import-level caching (e.g., Kimina Lean Server), which avoids import loading but not theorem-body elaboration. The patched Lean binary and the Snapshot-DSP pipeline will be released as open source upon publication.

2605.18587 2026-05-29 q-bio.GN cs.LG

PACE: Geometry-Aware Bridge Transport for Single-Cell Trajectory Inference

PACE: 几何感知的桥梁传输用于单细胞轨迹推断

Chenglei Yu, Chuanrui Wang, Bangyan Liao, Tailin Wu

AI总结 针对单细胞轨迹推断中异步发育导致的错位问题,提出PACE框架,通过构建各向异性黎曼度量、交替优化跨时间耦合与神经桥梁、蒸馏全局速度场,在七个数据集上平均降低MMD、Wasserstein-1和Wasserstein-2距离23.7%。

Comments 31 pages, 12 figures

详情
AI中文摘要

基于破坏性时间序列快照的单细胞轨迹推断本质上是病态的:既未观察到跨时间细胞对应关系,也未观察到连续轨迹,因此仅凭快照分布无法唯一确定底层动力学。现有的最优传输和基于流的方法通常根据观察到的时钟时间通过欧几里得邻近性耦合细胞,当发育异步且在同一实验时间采样的细胞处于不同潜在伪时间阶段时,这可能导致轨迹错位。我们提出PACE,一个轨迹推断框架,通过三个耦合组件从破坏性时间序列快照中恢复几何一致的连续传输动力学。首先,PACE构建一个状态和时间依赖的各向异性黎曼度量,沿局部支持的切向方向分配低传输成本,同时惩罚法向速度分量。其次,它在诱导路径作用成本下交替优化跨时间耦合,并拟合相邻快照之间保持端点的神经桥梁。第三,它将学习到的桥梁动力学蒸馏为细胞状态上的全局连续时间速度场。在涵盖九个保留重建实验的七个受控和生物数据集上,PACE实现了最强的整体重建性能,相对于最强竞争基线,平均降低了MMD、Wasserstein-1距离和Wasserstein-2距离23.7%。在胚状体分化基准上,PACE还将RNA速度对齐提高了15.4%,且在训练过程中不需要显式的细胞配对、谱系追踪或RNA速度监督。代码可在https://github.com/AI4Science-WestlakeU/PACE获取。

英文摘要

Single-cell trajectory inference from destructive time-course snapshots is fundamentally ill-posed: neither cross-time cell correspondences nor continuous trajectories are observed, so the snapshot distributions alone do not uniquely determine the underlying dynamics. Existing optimal transport and flow-based methods typically couple cells by Euclidean proximity at observed clock times, which can misalign trajectories when development is asynchronous and cells sampled at the same experimental time occupy different latent pseudotime stages. We propose PACE, a trajectory inference framework that recovers geometry-consistent continuous transport dynamics from destructive time-course snapshots through three coupled components. First, PACE constructs a state- and time-dependent anisotropic Riemannian metric that assigns low transport cost along locally supported tangent directions while penalizing normal velocity components. Second, it alternates between refining cross-time couplings under the induced path-action cost and fitting endpoint-preserving neural bridges between adjacent snapshots. Third, it distills the learned bridge dynamics into a global continuous-time velocity field over cellular states. Across seven controlled and biological datasets covering nine held-out reconstruction experiments, PACE achieves the strongest overall reconstruction performance, reducing MMD, Wasserstein-1 distance, and Wasserstein-2 distance by 23.7% on average relative to the strongest competing baseline. PACE also improves RNA-velocity alignment by 15.4% on an embryoid body differentiation benchmark, without requiring explicit cell pairing, lineage tracing, or RNA-velocity supervision during training. Code is available at https://github.com/AI4Science-WestlakeU/PACE.

2605.12208 2026-05-29 stat.ML cs.AI cs.LG stat.CO

Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

自监督拉普拉斯近似用于贝叶斯不确定性量化

Julian Rodemann, Alexander Marquard, Thomas Augustin, Michele Caprio

AI总结 提出自监督拉普拉斯近似(SSLA),通过重新拟合自预测数据直接近似后验预测分布,实现确定性、无采样的贝叶斯不确定性量化,并在回归任务中优于经典拉普拉斯近似。

Comments Accepted for publication in TMLR (https://openreview.net/forum?id=T8w8L2t3JG), v2: fixed typos and added a deceased-author footnote with a dedication to Thomas Augustin

详情
Journal ref
Transactions on Machine Learning Research (TMLR). ISSN 2835-8856 (2026)
AI中文摘要

近似贝叶斯推断通常围绕计算后验参数分布展开。然而,在实践中,感兴趣的主要对象通常是模型的预测而非其参数。在这项工作中,我们提出绕过参数后验,直接关注近似后验预测分布。我们通过从自监督和半监督学习中的自训练中汲取灵感来实现这一点。本质上,我们通过重新拟合自预测数据来量化贝叶斯模型的预测不确定性。这个想法非常简单:如果模型对自预测数据赋予高似然,那么这些预测的不确定性低,反之亦然。这产生了后验预测的确定性、无采样近似。我们的自监督拉普拉斯近似(SSLA)的模块化结构进一步允许我们插入不同的先验规范,从而实现经典的贝叶斯敏感性(关于先验选择)分析。为了绕过昂贵的重新拟合,我们进一步引入了SSLA的近似版本,称为ASSLA。我们从理论和经验上研究了(A)SSLA,涉及从贝叶斯线性模型到贝叶斯神经网络的回归模型。在模拟和真实数据集的广泛回归任务中,我们的方法在预测校准方面优于经典拉普拉斯近似,同时保持计算效率。

英文摘要

Approximate Bayesian inference typically revolves around computing the posterior parameter distribution. In practice, however, the main object of interest is often a model's predictions rather than its parameters. In this work, we propose to bypass the parameter posterior and focus directly on approximating the posterior predictive distribution. We achieve this by drawing inspiration from self-training within self-supervised and semi-supervised learning. Essentially, we quantify a Bayesian model's predictive uncertainty by refitting on self-predicted data. The idea is strikingly simple: If a model assigns high likelihood to self-predicted data, these predictions are of low uncertainty, and vice versa. This yields a deterministic, sampling-free approximation of the posterior predictive. The modular structure of our Self-Supervised Laplace Approximation (SSLA) further allows us to plug in different prior specifications, enabling classical Bayesian sensitivity (w.r.t. prior choice) analysis. In order to bypass expensive refitting, we further introduce an approximate version of SSLA, called ASSLA. We study (A)SSLA both theoretically and empirically in regression models ranging from Bayesian linear models to Bayesian neural networks. Across a wide array of regression tasks with simulated and real-world datasets, our methods outperform classical Laplace approximations in predictive calibration while remaining computationally efficient.

2605.07210 2026-05-29 cs.IR cs.CL

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

DiffRetriever: 基于扩散语言模型的并行代表标记用于检索

Shuai Wang, Yu Yin, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

AI总结 提出DiffRetriever,利用扩散语言模型的掩码位置预测直接进行检索,支持单表示和多表示检索,在多个基准上优于现有方法。

Comments Updated analysis, ablation and benchmark with sota retrievers, indexing storage/latency ablation, isolating the effectiveness gain

详情
AI中文摘要

本文展示了扩散语言模型(DLM)如何作为有效且高效的检索器。现有的基于DLM的检索器(例如DiffEmbed)遵循BERT风格的编码,将每个查询或段落表示为单个平均池化向量。这忽略了DLM在双向注意力下通过掩码位置预测生成响应的能力,而这种能力可以提供更强的检索信号。我们提出DiffRetriever,它直接使用DLM原生的掩码位置预测进行检索。对于每个查询或段落,DiffRetriever附加一个或多个掩码位置,在单次前向传播中将输出用作检索表示。使用一个掩码位置时,单表示DiffRetriever在相同骨干网络上已经优于DiffEmbed。DiffRetriever还自然地扩展到多表示检索:DLM联合处理多个掩码位置,实现ColBERT风格的细粒度匹配,且编码延迟增加很小。在自回归LLM检索器中,相同的多表示策略需要顺序解码,因此产生更高的延迟。DiffRetriever在我们匹配的比较中获得了最强的整体效果,优于DiffEmbed、PromptReps和RepLLaMA。在训练数据上选择的掩码位置数量能够很好地跨数据集迁移,而每个查询的变化表明自适应分配仍有提升空间。代码可在https://github.com/ielab/diffretriever获取。

英文摘要

This paper shows how diffusion language models (DLMs) can be used as effective and efficient retrievers. Existing DLM-based retrievers (e.g., DiffEmbed) follow BERT-style encoding, representing each query or passage as a single mean-pooled vector. This ignores how DLMs are trained to generate responses through masked-position prediction under bidirectional attention, a capability that can provide stronger retrieval signals. We propose DiffRetriever, which uses the DLM's native masked-position prediction directly for retrieval. For each query or passage, DiffRetriever appends one or more masked positions, using the outputs as retrieval representations in a single forward pass. With one masked position, single-representation DiffRetriever already improves over DiffEmbed on the same backbones. DiffRetriever also naturally extends to multi-representation retrieval: DLMs process multiple masked positions jointly, enabling ColBERT-style fine-grained matching with little additional encoding latency. In autoregressive LLM retrievers, the same multi-representation strategy requires sequential decoding and therefore incurs much higher latency. DiffRetriever obtains the strongest aggregate effectiveness within our matched comparison, outperforming DiffEmbed, PromptReps, and RepLLaMA. Masked-position counts selected on training data transfer well across datasets, while per-query variation suggests headroom for adaptive allocation. Code is available at https://github.com/ielab/diffretriever.

2604.23354 2026-05-29 eess.AS cs.AI eess.SP

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

说话人识别中的可解释AI——使潜在表示可理解

Yanze Xu, Wenwu Wang, Mark D. Plumbley

AI总结 本文提出层次聚类算法(SLINK和HDBSCAN)分析说话人识别网络表示中的层次聚类现象,并设计HCCM算法和Liebig分数为这些聚类提供语义解释。

Comments 15 pages, 10 figures

详情
AI中文摘要

神经网络可以训练从数据中学习任务相关的表示。理解这些网络如何做出决策属于可解释AI(XAI)领域。本文提出研究一个XAI主题:揭示表示中未知的组织结构,特别是说话人识别网络从话语中学习到的用于识别说话人身份的表示。过去的研究使用算法(如K-means)分析网络表示如何自然地以不同方式组织成独立聚类,即分析这些表示定义的空间(称为网络表示空间)内的平面聚类现象。相比之下,本文应用两种算法,单链接聚类(SLINK)和基于密度的噪声应用空间聚类(HDBSCAN),分析表示如何以不同方式形成层次聚类,即分析网络表示空间内的层次聚类现象。为了进一步理解这些层次聚类现象,我们提出了一种新算法,称为层次聚类-类别匹配(HCCM)。HCCM通过将SLINK和HDBSCAN产生的层次聚类与预定义的语义类别匹配,为这些聚类提供语义解释。通过这个过程,一些聚类被解释为单个语义类别(例如男性),而其他聚类被解释为单个语义类别的合取(例如女性和爱尔兰)。此外,我们开发了一个新的度量标准,Liebig分数,用于量化聚类与语义类别的匹配程度,这有助于识别每个匹配中最受限制的因素。

英文摘要

Neural networks can be trained to learn task-relevant representations from data. Understanding how these networks make decisions falls within the Explainable AI (XAI) domain. This paper proposes to study an XAI topic: uncovering the unknown organisation in the representations, particularly those a speaker recognition network learns from utterances, for recognising speaker identity. Past studies have employed algorithms (e.g. K-means) to analyse how network representations can be naturally organised into independent clusters in different ways, i.e., to analyse flat clustering phenomena within the space defined by these representations, referred to as the network representation space. In contrast, this work applies two algorithms, Single-Linkage Clustering (SLINK) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), to analyse how representations form hierarchical clusters in different ways, i.e., to analyse hierarchical clustering phenomena within the network representation space. To further understand these hierarchical clustering phenomena, we propose a new algorithm termed Hierarchical Cluster-Class Matching (HCCM). HCCM provides a semantic interpretation for the hierarchical clusters produced by SLINK and HDBSCAN by matching them to predefined semantic classes. Through this process, some clusters are interpreted as individual semantic classes (e.g. male), whereas others are interpreted as conjunctions of individual semantic classes (e.g. female and Ireland). In addition, we develop a new metric, the Liebig score, to quantify how well a cluster matches a semantic class, which helps identify the factor that most strongly limits each match.

2604.23256 2026-05-29 cs.NE cs.AI cs.LG cs.SC

Architecture-Induced Recoverability Bias in Differentiable Symbolic Regression

可微符号回归中的架构诱导的可恢复性偏差

Chakshu Gupta, Theodore J. LaGrow

AI总结 本文研究可微符号回归中,变量路由架构对表达式可恢复性的影响,发现不同架构导致恢复率从0/64到64/64变化,并提出基于验证的架构选择方法将恢复率从34.4%提升至50.1%。

Comments 6 pages, 4 figures, 3 tables; submitted to IEEE MLSP 2026

详情
AI中文摘要

符号回归旨在从数值数据中恢复闭式表达式,但在可微符号回归中,恢复的表达式不仅取决于语法,还取决于训练期间变量路由的固定架构。这与闭式模型和可解释非线性结构有用的信号处理设置相关。这种特定于架构的影响很少被直接隔离,因为现有比较通常同时改变架构、算子族、语法或搜索过程。本文比较了三种深度为3的架构,涵盖24种算子-形状-叶子组合,在尽可能固定算子族、语法和训练协议的同时改变变量路由架构。在架构加原生训练协议的比较下,同一目标的恢复率从0/64变为64/64。一个目标上最好的架构在另一个目标上是最差的,并且具有两个等深子树的结构在所有测试配置中均失败(0/3,776)。作为概念验证的缓解措施,训练一个小型架构集,并选择保留集上RMSE最低的硬化表达式。在联合运行的子集上,这将恢复率从仅存在于所有三种配置中的架构的34.4%提高到50.1%。在肖克利二极管目标上,验证选择器恢复了该基线架构遗漏的情况,而该基线架构本身仅恢复0/32个种子。由于联合运行子集仅包含三种配置,选择器结果证明基于验证的架构选择是有前景的,而非完整的基准测试。这些结果支持将架构视为可测量的设计变量,应予以报告、压力测试,并使用保留验证集进行选择,而非先验固定。

英文摘要

Symbolic regression aims to recover closed-form expressions from numerical data, but in differentiable symbolic regression the recovered expression depends not only on the grammar but also on the fixed architecture through which variables are routed during training. This is relevant to signal-processing settings in which closed-form models and interpretable nonlinear structure are useful. This architecture-specific effect has rarely been isolated directly, because existing comparisons often vary architecture together with operator family, grammar, or search procedure. Three depth-3 architectures are compared across twenty-four operator--shape--leaf combinations, holding operator family, grammar, and training protocol fixed as far as possible while varying the variable-routing architecture. Recovery changes from $0/64$ to $64/64$ trials on the same target under an architecture-plus-native-training-protocol comparison. The best architecture on one target is the worst on another, and trees with two equal-depth subtrees fail in every configuration tested ($0/3{,}776$). As a proof-of-concept mitigation, a small architecture set is trained and the hardened expression with the lowest held-out RMSE is selected. On the jointly-run subset, this improves recovery from $34.4\%$ for the only architecture present in all three configurations to $50.1\%$. On a Shockley diode target, the validation selector recovers cases missed by that baseline architecture, which by itself recovers $0/32$ seeds. Since the jointly-run subset contains only three configurations, the selector result is evidence that validation-based architecture selection is promising, not a complete benchmark. These results support treating architecture as a measurable design variable that should be reported, stress-tested, and selected using held-out validation rather than fixed a priori.

2604.17176 2026-05-29 eess.SY cs.AI cs.SY math.OC

Intent-aligned Autonomous Spacecraft Guidance via Reasoning Models

通过推理模型实现意图对齐的自主航天器制导

Yuji Takubo, Simone D'Amico

AI总结 提出一种通过行为序列和航点约束将高层推理与安全轨迹优化相结合的意图对齐航天器制导框架,在近距离操作场景中实现了超过90%的SCP收敛率,并比启发式决策高出1.5倍的满足顶级意图优先性能标准的轨迹生成率。

Comments Accepted for Computer Vision and Pattern Recognition Conference (CVPR) 2026, AI4Space Workshop (4-page Short paper). 9 pages, 3 figures (including supplementary materials)

详情
AI中文摘要

未来的航天器操作需要能够解释高层任务意图同时保持安全性的自主性。然而,现有的轨迹优化仍然严重依赖专家设计的公式,并且不支持意图条件决策。本文提出了一种意图对齐的航天器制导框架,通过显式的中间抽象(基于行为序列和航点约束)将高层推理与安全轨迹优化联系起来。基础模型首先预测意图对齐的行为计划,然后航点生成模型将其转换为航点约束,最后通过优化计算安全轨迹。这种分解使得在不牺牲安全性的情况下实现可扩展的监督。在近距离操作场景中的数值实验表明,所提出的流程实现了超过90%的SCP收敛率,并且比启发式决策高出1.5倍的生成满足顶级意图优先性能标准的轨迹率。这些结果支持将中间行为抽象作为基础模型推理与安全关键型星载航天器自主性之间的实用接口。

英文摘要

Future spacecraft operations require autonomy that can interpret high-level mission intent while preserving safety. However, existing trajectory optimization still relies heavily on expert-crafted formulations and does not support intent-conditioned decision-making. This paper proposes an intent-aligned spacecraft guidance framework that links high-level reasoning and safe trajectory optimization through explicit intermediate abstractions, based on behavior sequences and waypoint constraints. A foundation model first predicts an intent-aligned behavior plan, a waypoint generation model then converts it into waypoint constraints, and the safe trajectory is computed via optimization. This decomposition enables scalable supervision without sacrificing safety. Numerical experiments in close-proximity operation scenarios demonstrate that the proposed pipeline achieves over 90\% SCP convergence and yields a $1.5\times$ higher rate of generating trajectories that satisfy the top intent-prioritized performance criteria than heuristic decision-making. These results support the use of intermediate behavior abstraction as a practical interface between foundation-model reasoning and safety-critical onboard spacecraft autonomy.

2604.13410 2026-05-29 stat.ME cs.LG stat.ML

Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression

使用两阶段核岭回归估计连续治疗效果

Seok-Jin Kim, Kaizheng Wang

AI总结 针对连续治疗的效果函数估计问题,提出两阶段核岭回归方法,通过第一阶段建模响应与治疗和协变量的关系,第二阶段构造伪结果校正分布偏移,无需估计条件治疗密度即可达到最优学习界,并实现数据驱动的模型选择。

详情
AI中文摘要

我们研究连续治疗的效果函数估计问题,该函数将每个治疗值映射到群体平均结果。该设置中的一个核心挑战是混杂:治疗分配通常依赖于协变量,产生选择偏差,使得直接对响应进行回归不可靠。为了解决这个问题,我们提出了一种两阶段核岭回归方法。在第一阶段,我们学习一个模型,将响应表示为治疗和协变量的函数;在第二阶段,我们使用该模型构造伪结果以校正分布偏移,然后拟合第二个模型来估计治疗效果。尽管响应随治疗和协变量变化,但通过对协变量平均得到的诱导效果函数通常更简单,我们的估计器适应这种结构。我们在不估计条件治疗密度的情况下实现了最优学习界,从而绕过了现有方法中的一个主要瓶颈。此外,我们引入了一种完全数据驱动的模型选择程序,该程序对未知的重叠程度和底层核的谱衰减具有可证明的自适应性。

英文摘要

We study the problem of estimating the effect function for a continuous treatment, which maps each treatment value to a population-averaged outcome. A central challenge in this setting is confounding: treatment assignment often depends on covariates, creating selection bias that makes direct regression of the response on treatment unreliable. To address this issue, we propose a two-stage kernel ridge regression method. In the first stage, we learn a model for the response as a function of both treatment and covariates; in the second stage, we use this model to construct pseudo-outcomes that correct for distribution shift, and then fit a second model to estimate the treatment effect. Although the response varies with both treatment and covariates, the induced effect function obtained by averaging over covariates is typically much simpler, and our estimator adapts to this structure. Our optimal learning bounds are achieved without estimating the conditional treatment density, thereby bypassing a major bottleneck in existing methods. Furthermore, we introduce a fully data-driven model selection procedure that achieves provable adaptivity to both the unknown degree of overlap and the spectral decay of the underlying kernel.

2604.13147 2026-05-29 stat.ML cs.LG math.PR

Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

基于离模型训练和重要性采样的自适应学习用于完全非马尔可夫最优随机控制(完整版)

Dorival Leão, Alberto Ohashi, Simone Scotti, Adolfo M. D da Silva

AI总结 针对完全非马尔可夫且依赖未知模型参数的连续时间随机控制问题,提出一种基于离散骨架和重要性采样的蒙特卡洛学习方法,实现离模型训练架构和自适应参数更新,并给出非渐近误差界。

Comments Typos are fixed. Numerical experiment is revised

详情
AI中文摘要

本文研究连续时间随机控制问题,其受控状态是完全非马尔可夫的,且依赖于未知模型参数。这类问题自然出现在路径依赖随机微分方程、粗糙波动率对冲以及分数布朗运动驱动的系统中。基于先前工作中发展的离散骨架方法,我们提出了一种用于相关嵌入后向动态规划方程的蒙特卡洛学习方法。我们的主要贡献有两方面。首先,针对几类具有代表性的非马尔可夫受控系统,我们构造了显式的支配训练律和Radon-Nikodym权重。这产生了一种离模型训练架构,其中在参考律下生成固定的合成数据集,而通过重要性采样恢复与目标模型相关的动态规划算子。其次,我们利用这种结构设计了参数模型不确定性下的自适应更新机制,使得可以通过重新加权相同的训练样本而非重新生成新轨迹来执行重复校准。对于固定参数,我们建立了通过深度神经网络逼近嵌入动态规划方程的非渐近误差界。对于自适应学习,我们推导了将蒙特卡洛逼近误差与模型风险误差分离的定量估计。数值实验在结构化线性二次型例子中展示了离模型训练机制和自适应重要性采样更新。

英文摘要

This paper studies continuous-time stochastic control problems whose controlled states are fully non-Markovian and depend on unknown model parameters. Such problems arise naturally in path-dependent stochastic differential equations, rough-volatility hedging, and systems driven by fractional Brownian motion. Building on the discrete skeleton approach developed in earlier work, we propose a Monte Carlo learning methodology for the associated embedded backward dynamic programming equation. Our main contribution is twofold. First, we construct explicit dominating training laws and Radon--Nikodym weights for several representative classes of non-Markovian controlled systems. This yields an off-model training architecture in which a fixed synthetic dataset is generated under a reference law, while the dynamic programming operators associated with a target model are recovered by importance sampling. Second, we use this structure to design an adaptive update mechanism under parametric model uncertainty, so that repeated recalibration can be performed by reweighting the same training sample rather than regenerating new trajectories. For fixed parameters, we establish non-asymptotic error bounds for the approximation of the embedded dynamic programming equation via deep neural networks. For adaptive learning, we derive quantitative estimates that separate Monte Carlo approximation error from model-risk error. Numerical experiments illustrate both the off-model training mechanism and the adaptive importance-sampling update in structured linear-quadratic examples.

2604.06811 2026-05-29 cs.CR cs.AI

SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

SkillTrojan:基于技能智能体系统的后门攻击

Yunhao Feng, Yifan Ding, Yingshui Tan, Boren Zheng, Yanming Guo, Xiaolong Li, Kun Zhai, Yishan Li, Wenke Huang

AI总结 提出SkillTrojan,一种针对技能实现而非模型参数的后门攻击方法,通过将恶意逻辑嵌入看似正常的技能中,利用技能组合重构并执行攻击者指定的负载,在保持良性行为的同时实现高攻击成功率。

详情
AI中文摘要

基于技能的智能体系统通过组合可复用技能来处理复杂任务,提高了模块化和可扩展性,同时引入了一个几乎未被审视的安全攻击面。我们提出SkillTrojan,一种针对技能实现而非模型参数或训练数据的后门攻击。SkillTrojan将恶意逻辑嵌入看似合理的技能中,并利用标准技能组合来重构和执行攻击者指定的负载。该攻击将加密负载分割到多个看似良性的技能调用中,仅在预定义触发条件下激活。SkillTrojan还支持从任意技能模板自动合成带后门的技能,从而在基于技能的智能体生态系统中实现可扩展传播。为了进行系统评估,我们发布了一个包含3000多个精心策划的带后门技能的数据集,涵盖多种技能模式和触发-负载配置。我们在一个代表性的基于代码的智能体设置中实例化SkillTrojan,并评估了干净任务效用和攻击成功率。结果表明,技能级后门可以非常有效,同时对良性行为的退化最小,暴露了当前基于技能的智能体架构中的一个关键盲点,并促使防御机制明确考虑技能组合和执行。具体来说,在EHR SQL上,SkillTrojan在GPT-5.2-1211-Global上实现了高达97.2%的攻击成功率,同时保持了89.3%的干净准确率。

英文摘要

Skill-based agent systems tackle complex tasks by composing reusable skills, improving modularity and scalability while introducing a largely unexamined security attack surface. We propose SkillTrojan, a backdoor attack that targets skill implementations rather than model parameters or training data. SkillTrojan embeds malicious logic inside otherwise plausible skills and leverages standard skill composition to reconstruct and execute an attacker-specified payload. The attack partitions an encrypted payload across multiple benign-looking skill invocations and activates only under a predefined trigger. SkillTrojan also supports automated synthesis of backdoored skills from arbitrary skill templates, enabling scalable propagation across skill-based agent ecosystems. To enable systematic evaluation, we release a dataset of 3,000+ curated backdoored skills spanning diverse skill patterns and trigger-payload configurations. We instantiate SkillTrojan in a representative code-based agent setting and evaluate both clean-task utility and attack success rate. Our results show that skill-level backdoors can be highly effective with minimal degradation of benign behavior, exposing a critical blind spot in current skill-based agent architectures and motivating defenses that explicitly reason about skill composition and execution. Concretely, on EHR SQL, SkillTrojan attains up to 97.2% ASR while maintaining 89.3% clean ACC on GPT-5.2-1211-Global.

2604.05446 2026-05-29 stat.ML cs.LG

MEC: Machine-Learning-Assisted Generalized Entropy Calibration for Semi-Supervised Mean Estimation

MEC:基于机器学习的广义熵校准用于半监督均值估计

Se Yoon Lee, Jae Kwang Kim

AI总结 提出MEC方法,通过交叉拟合校准加权改进预测驱动推断,在半监督均值估计中实现半参数效率界,并提升置信区间覆盖率和精度。

详情
AI中文摘要

获取高质量标签成本高昂,而无标签协变量通常丰富,这推动了具有可靠不确定性量化的半监督推断方法的发展。预测驱动推断(PPI)利用在少量标记样本上训练的机器学习预测器来提高效率,但在模型误指定下可能损失效率,并因标签重用而导致覆盖失真。我们引入了基于机器学习的广义熵校准(MEC),这是PPI的一种交叉拟合、校准加权变体。MEC通过基于Bregman投影的原则性校准框架对标记样本重新加权,以更好地与目标群体对齐,从而提高效率。这使MEC对预测器的仿射变换具有鲁棒性,并通过用更弱的投影误差条件替代原始预测误差条件,放宽了有效性的要求。因此,MEC在比现有PPI变体更弱的假设下达到了半参数效率界。在模拟和实际数据应用中,MEC实现了接近名义覆盖率的置信区间,并且比CF-PPI和普通PPI具有更紧的置信区间。

英文摘要

Obtaining high-quality labels is costly, whereas unlabeled covariates are often abundant, motivating semi-supervised inference methods with reliable uncertainty quantification. Prediction-powered inference (PPI) leverages a machine-learning predictor trained on a small labeled sample to improve efficiency, but it can lose efficiency under model misspecification and suffer from coverage distortions due to label reuse. We introduce Machine-Learning-Assisted Generalized Entropy Calibration (MEC), a cross-fitted, calibration-weighted variant of PPI. MEC improves efficiency by reweighting labeled samples to better align with the target population, using a principled calibration framework based on Bregman projections. This yields robustness to affine transformations of the predictor and relaxes requirements for validity by replacing conditions on raw prediction error with weaker projection-error conditions. As a result, MEC attains the semiparametric efficiency bound under weaker assumptions than existing PPI variants. Across simulations and a real-data application, MEC achieves near-nominal coverage and tighter confidence intervals than CF-PPI and vanilla PPI.

2604.01473 2026-05-29 cs.CR cs.AI

SelfGrader: LLM Jailbreak Detection via Anchored Token-Level Logits

SelfGrader: 基于锚定令牌级对数概率的LLM越狱检测

Zikai Zhang, Rui Hu, Olivera Kotevska, Jiahao Xu

AI总结 提出SelfGrader方法,利用锚定令牌级对数概率将越狱检测转化为数值评分问题,实现低延迟、低误报率的鲁棒检测。

详情
AI中文摘要

大型语言模型(LLM)是回答用户查询的强大工具,但仍然极易受到越狱攻击。现有的护栏方法通常依赖内部特征或文本响应来检测恶意查询,这要么引入大量延迟,要么遭受文本生成的随机性。为了克服这些限制,我们提出SelfGrader,一种轻量级护栏方法,它将越狱检测表述为使用锚定令牌级对数概率的数值评分问题。具体来说,SelfGrader在一组紧凑的数值令牌(NT)(例如0-9)内评估用户查询的安全性,并将其对数概率分布解释为内部安全信号。为了将这些信号与目标安全准则对齐,SelfGrader构建了概率近似正确引导的ICL锚定示例,并引入了双视角评分规则,同时考虑查询的恶意性和良性,从而产生稳定且可解释的分数,反映危害性并同时降低误报率。跨不同越狱基准、自适应攻击、良性提示基准、多个LLM和最先进的护栏基线的广泛实验表明,SelfGrader在低误报率、内存开销和延迟下实现了强鲁棒性。

英文摘要

Large Language Models (LLMs) are powerful tools for answering user queries, yet they remain highly vulnerable to jailbreak attacks. Existing guardrail methods typically rely on internal features or textual responses to detect malicious queries, which either introduce substantial latency or suffer from randomness in text generation. To overcome these limitations, we propose SelfGrader, a lightweight guardrail method that formulates jailbreak detection as a numerical grading problem using anchored token-level logits. Specifically, SelfGrader evaluates the safety of a user query within a compact set of numerical tokens (NTs) (e.g., 0-9) and interprets their logit distribution as an internal safety signal. To align these signals with the target safety rubric, SelfGrader constructs Probably Approximately Correct-guided ICL anchor examples and introduces a dual-perspective scoring rule that considers both the maliciousness and benignness of the query, yielding a stable and interpretable score that reflects harmfulness and reduces the false positive rate simultaneously. Extensive experiments across diverse jailbreak benchmarks, adaptive attacks, benign prompt benchmarks, multiple LLMs, and state-of-the-art guardrail baselines demonstrate that SelfGrader achieves strong robustness with low false positive rates, memory overhead, and latency.

2603.14778 2026-05-29 cs.CR cs.AI

P$^2$RAG: Efficient Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval

P$^2$RAG: 支持任意Top-$k$检索的高效隐私保护RAG服务

Yulong Ming, Mingyue Wang, Jijia Yang, Jie Xu, Zihan Wu, Cong Wang, Xiaohua Jia

AI总结 针对现有隐私保护RAG系统无法灵活支持大k值且效率低下的问题,提出基于秘密共享和交互式二分法的P$^2$RAG系统,避免排序开销,实现高效且安全的任意top-$k$检索。

Comments 14 pages, 3 figures

详情
AI中文摘要

检索增强生成(RAG)使大型语言模型能够利用外部知识,但外包RAG服务会引发数据所有者和用户的隐私担忧。隐私保护RAG系统通过执行安全的top-$k$检索来解决这些问题,通常使用安全排序来识别相关文档。然而,现有系统面临支持任意$k$的挑战,因为它们无法更改$k$,存在新的安全问题,尤其是当$k$较大时效率下降。这是一个重大限制,因为金融、法律和医疗等应用需要足够大的$k$,导致现有系统开销巨大。此外,现代长上下文模型通常通过更大的检索集获得更高的准确性。我们提出P$^2$RAG,一种高效隐私保护的RAG服务,支持任意top-$k$检索。与现有系统不同,P$^2$RAG避免对候选文档进行排序,而是使用交互式二分法来确定top-$k$文档集。在安全性方面,P$^2$RAG在两个半诚实非共谋服务器上使用秘密共享来保护数据所有者的数据库和用户的提示。它通过限制和验证来防御恶意用户,并严格限制数据库的信息泄露。实验表明,对于$k = 16$--$1024$,P$^2$RAG比最先进的PRAG快3--300倍。

英文摘要

Retrieval-Augmented Generation (RAG) enables large language models to use external knowledge, but outsourcing the RAG service raises privacy concerns for both data owners and users. Privacy-preserving RAG systems address these concerns by performing secure top-$k$ retrieval, which is typically implemented using secure sorting to identify relevant documents. However, existing systems face challenges supporting arbitrary $k$ due to their inability to change $k$, new security issues, and in particular, efficiency degradation with large $k$. This is a significant limitation because applications such as finance, law, and healthcare require a $k$ that is large enough to cause huge overhead for existing systems. Also, modern long-context models generally achieve higher accuracy with larger retrieval sets. We propose P$^2$RAG, an efficient privacy-preserving RAG service that supports arbitrary top-$k$ retrieval. Unlike existing systems, P$^2$RAG avoids sorting candidate documents. Instead, it uses an interactive bisection method to determine the set of top-$k$ documents. For security, P$^2$RAG uses secret sharing on two semi-honest non-colluding servers to protect the data owner's database and the user's prompt. It enforces restrictions and verification to defend against malicious users and tightly bounds the information leakage of the database. The experiments show that P$^2$RAG is 3--300$\times$ faster than the state-of-the-art PRAG for $k = 16$--$1024$.

2603.14644 2026-05-29 eess.IV cs.CV cs.DB cs.LG

LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol

LUMINA:采用能量协调协议的多供应商乳腺X线摄影基准

Hongyi Pan, Gorkem Durak, Halil Ertugrul Aktas, Andrea M. Bejar, Baver Tutun, Emre Uysal, Ezgi Bulbul, Mehmet Fatih Dogan, Berrin Erok, Berna Akkus Yildirim, Sukru Mehmet Erturk, Ulas Bagci

AI总结 为解决现有FFDM数据集规模小、标注少和供应商多样性不足的问题,提出LUMINA多供应商数据集及能量协调方法,通过前景像素对齐减少域偏移,在诊断、BI-RADS分类和密度估计任务上验证了模型性能提升。

Comments This paper was accepted to CVPR 2026

详情
AI中文摘要

公开可用的全视野数字乳腺X线摄影(FFDM)数据集在规模、临床标注和供应商多样性方面仍然有限,阻碍了稳健模型的发展。我们引入了LUMINA,一个经过整理的多供应商FFDM数据集,明确编码了采集能量和供应商元数据,以捕捉现有基准中常被忽略的临床相关外观变化。该数据集包含来自468名患者的1824张图像(960张良性,864张恶性),附有病理确认标签、BI-RADS评估和乳腺密度标注。LUMINA涵盖六个采集系统,包括高能和低能成像模式,能够系统分析供应商和能量引起的域偏移。为应对这些变化,我们提出了一种仅前景的像素空间对齐方法(“能量协调”),将图像映射到低能参考,同时保留病变形态。我们在三个临床相关任务上对CNN和Transformer模型进行了基准测试:诊断(良性 vs. 恶性)、BI-RADS分类和密度估计。双视图模型一致优于单视图模型。EfficientNet-B0在诊断任务上达到93.54%的AUC,而Swin-T在密度预测上达到最佳宏平均AUC 89.43%。协调方法提升了各架构的性能,并产生了更局部的Grad-CAM响应。总体而言,LUMINA提供了(1)一个供应商多样化的基准和(2)一个模型无关的协调框架,用于可靠且可部署的乳腺X线摄影AI。

英文摘要

Publicly available full-field digital mammography (FFDM) datasets remain limited in size, clinical annotations, and vendor diversity, hindering the development of robust models. We introduce LUMINA, a curated, multi-vendor FFDM dataset that explicitly encodes acquisition energy and vendor metadata to capture clinically relevant appearance variations often overlooked in existing benchmarks. This dataset contains 1824 images from 468 patients (960 benign, 864 malignant), with pathology-confirmed labels, BI-RADS assessments, and breast-density annotations. LUMINA spans six acquisition systems and includes both high- and low-energy imaging styles, enabling systematic analysis of vendor- and energy-induced domain shifts. To address these variations, we propose a foreground-only pixel-space alignment method (''energy harmonization'') that maps images to a low-energy reference while preserving lesion morphology. We benchmark CNN and transformer models on three clinically relevant tasks: diagnosis (benign vs. malignant), BI-RADS classification, and density estimation. Two-view models consistently outperform single-view models. EfficientNet-B0 achieves an AUC of 93.54% for diagnosis, while Swin-T achieves the best macro-AUC of 89.43% for density prediction. Harmonization improves performance across architectures and produces more localized Grad-CAM responses. Overall, LUMINA provides (1) a vendor-diverse benchmark and (2) a model-agnostic harmonization framework for reliable and deployable mammography AI.

2602.13238 2026-05-29 cs.NI cs.LG

Securing SIM-Assisted Wireless Networks via Quantum Reinforcement Learning

通过量子强化学习保护SIM辅助无线网络

Le-Hung Hoang, Quang-Trung Luu, Dinh Thai Hoang, Diep N. Nguyen, Van-Dinh Nguyen

AI总结 针对SIM辅助无线网络中高维优化和动态环境挑战,提出混合量子近端策略优化框架,联合优化功率分配和SIM相位,实现约15%保密率提升和30%收敛加速。

Comments Submitted to IEEE TCOM: 13 pages

详情
AI中文摘要

堆叠智能超表面(SIM)最近作为一种强大的波域技术出现,通过多层可编程架构实现对电磁信号的多级操控。虽然SIM为增强物理层安全提供了前所未有的自由度,但其极大数量的超原子导致高维且强耦合的优化空间,使得传统设计方法效率低下且难以扩展。此外,现有的深度强化学习(DRL)技术在动态无线环境中,面对被动窃听者的不完美知识时,存在收敛慢和性能下降的问题。为应对这些挑战,我们提出了一种混合量子近端策略优化(QPPO)框架,用于SIM辅助的安全通信,该框架联合优化发射功率分配和SIM相移,以在功率和服务质量约束下最大化平均保密率。具体而言,将参数化量子电路嵌入演员网络,形成混合经典-量子策略架构,增强了高维连续动作空间中的策略表示能力和探索效率。大量仿真表明,所提出的Q-PPO方案始终优于DRL基线,在不完美窃听者信道状态信息下,实现了约15%更高的保密率和30%更快的收敛速度。这些结果确立了Q-PPO作为SIM赋能安全无线网络的强大优化范式。

英文摘要

Stacked intelligent metasurfaces (SIMs) have recently emerged as a powerful wave-domain technology that enables multi-stage manipulation of electromagnetic signals through multilayer programmable architectures. While SIMs offer unprecedented degrees of freedom for enhancing physical-layer security, their extremely large number of meta-atoms leads to a high-dimensional and strongly coupled optimization space, making conventional design approaches inefficient and difficult to scale. Moreover, existing deep reinforcement learning (DRL) techniques suffer from slow convergence and performance degradation in dynamic wireless environments with imperfect knowledge of passive eavesdroppers. To address these challenges, we propose a hybrid quantum proximal policy optimization (QPPO) framework for SIM-assisted secure communications that jointly optimizes transmit power allocation and SIM phase shifts to maximize the average secrecy rate under power and quality-of-service constraints. Specifically, a parameterized quantum circuit is embedded into the actor network, forming a hybrid classical-quantum policy architecture that enhances policy representation capability and exploration efficiency in high-dimensional continuous action spaces. Extensive simulations demonstrate that the proposed Q-PPO scheme consistently outperforms DRL baselines, achieving approximately 15% higher secrecy rates and 30% faster convergence under imperfect eavesdropper channel state information. These results establish Q-PPO as a powerful optimization paradigm for SIM-enabled secure wireless networks.

2602.11760 2026-05-29 stat.ML cs.LG

Aggregate Models, Not Explanations: Improving Feature Importance Estimation

聚合模型而非解释:改进特征重要性估计

Joseph Paillard, Angel Reyero Lobo, Denis A. Engemann, Bertrand Thirion

AI总结 针对特征重要性估计不准确的问题,本文通过理论分析证明模型级集成比解释级集成能更有效地降低误差,并在基准和蛋白质组学数据上验证。

详情
AI中文摘要

特征重要性方法有望将机器学习模型从预测引擎转变为科学发现的工具。然而,由于数据采样和算法随机性,表达性模型可能不稳定,导致变量重要性估计不准确,削弱其在关键生物医学应用中的效用。尽管集成提供了一种解决方案,但由于重要性度量的非线性,决定是解释单个集成模型还是聚合单个模型解释是困难的,并且尚未得到充分研究。我们的理论分析在适应复杂最先进机器学习模型的假设下发展,揭示了这一选择主要由模型的超额风险驱动。与先前文献相反,我们表明模型级集成通过减少这一主导误差项,提供了更准确的变量重要性估计,特别是对于表达性模型。我们在经典基准和来自英国生物银行的大规模蛋白质组学研究中验证了这些发现。

英文摘要

Feature-importance methods show promise in transforming machine learning models from predictive engines into tools for scientific discovery. However, due to data sampling and algorithmic stochasticity, expressive models can be unstable, leading to inaccurate variable importance estimates and undermining their utility in critical biomedical applications. Although ensembling offers a solution, deciding whether to explain a single ensemble model or aggregate individual model explanations is difficult due to the nonlinearity of importance measures and remains largely understudied. Our theoretical analysis, developed under assumptions accommodating complex state-of-the-art ML models, reveals that this choice is primarily driven by the model's excess risk. In contrast to prior literature, we show that ensembling at the model level provides more accurate variable-importance estimates, particularly for expressive models, by reducing this leading error term. We validate these findings on classical benchmarks and a large-scale proteomic study from the UK Biobank.

2602.08567 2026-05-29 cs.MA cs.CL

ValueFlow: Measuring the Propagation of Value Perturbations in Multi-Agent LLM Systems

ValueFlow: 多智能体大语言模型中价值扰动的传播度量

Jinnuo Liu, Chuke Liu, Hua Shen

AI总结 提出ValueFlow框架,通过56维价值数据集和LLM-as-a-judge协议,将价值漂移分解为智能体级响应行为与系统级结构效应,揭示价值对齐是系统级属性。

Comments Preprint. Under review. 28 pages, 10 figures

详情
AI中文摘要

多智能体大语言模型系统日益由观察并响应彼此输出的智能体组成。虽然价值对齐通常针对孤立模型进行评估,但价值扰动如何通过智能体交互传播仍知之甚少。我们提出ValueFlow,一个基于扰动的框架,通过源自施瓦茨价值调查的56维价值数据集,并使用LLM-as-a-judge协议对智能体价值取向进行评分,来度量多智能体系统中的价值漂移。ValueFlow将价值漂移分解为智能体级响应行为和系统级结构效应,由两个指标捕获:\b{eta}-敏感性(智能体对受扰同伴价值信号的敏感度)和系统敏感性(节点级扰动对最终系统输出的影响)。实验跨越价值维度、骨干模型、角色和拓扑,表明敏感性在不同价值间差异显著,并受交互结构强烈影响,表明多智能体系统中的价值对齐是系统级属性,而不仅仅是智能体级属性。因此,ValueFlow为审计和缓解部署的多智能体系统中的价值传播提供了原则性基础。

英文摘要

Multi-agent large language model (LLM) systems increasingly consist of agents that observe and respond to one another's outputs. While value alignment is typically evaluated for isolated models, how value perturbations propagate through agent interactions remains poorly understood. We present ValueFlow, a perturbation-based framework that measures value drift in multi-agent systems via a 56-value valuation dataset derived from the Schwartz Value Survey, with agent value orientations scored using an LLM-as-a-judge protocol. ValueFlow decomposes value drift into agent-level response behavior and system-level structural effects, captured by two metrics: \b{eta}-susceptibility, an agent's sensitivity to perturbed peer value signals, and system susceptibility (SS), the effect of node-level perturbations on final system outputs.Experiments span across value dimensions, backbones, personas, and topologies, showing that susceptibility varies sharply across values and is strongly shaped by interaction structure, indicating that value alignment in multi-agent systems is a system-level property, not just an agent-level one. ValueFlow thus provides a principled basis for auditing and mitigating value propagation in deployed multi-agent systems.

2602.00324 2026-05-29 math.OC cs.CV cs.RO eess.SP

Dual Quaternion SE(3) Synchronization with Recovery Guarantees

对偶四元数 SE(3) 同步及其恢复保证

Jianing Zhao, Linglingzhi Zhu, Anthony Man-Cho So

AI总结 采用对偶四元数表示,通过谱初始化和对偶四元数广义幂法实现 SE(3) 同步,并给出误差界和线性收敛保证。

Comments ICML 2026

详情
AI中文摘要

特殊欧几里得群 SE(3) 上的同步旨在从含噪的成对相对变换中恢复绝对位姿,是机器人和 3D 视觉中的核心基本操作。标准方法通常需要多步启发式程序来恢复有效位姿,这些程序难以分析且通常缺乏理论保证。本文采用对偶四元数表示,并直接在对偶四元数单位上制定 SE(3) 同步。开发了一个两阶段算法:通过 Hermitian 对偶四元数测量矩阵上的幂法计算谱初始化,随后是对偶四元数广义幂法 (DQGPM),通过每次迭代投影来强制执行可行性。建立了谱估计器的估计误差界,并证明 DQGPM 具有有限迭代误差界,并实现线性误差收缩直至显式的噪声相关阈值。在合成基准和真实多扫描点集配准上的实验表明,所提出的流程在准确性和效率上均优于代表性的基于矩阵的方法。

英文摘要

Synchronization over the special Euclidean group SE(3) aims to recover absolute poses from noisy pairwise relative transformations and is a core primitive in robotics and 3D vision. Standard approaches often require multi-step heuristic procedures to recover valid poses, which are difficult to analyze and typically lack theoretical guarantees. This paper adopts a dual quaternion representation and formulates SE(3) synchronization directly over the unit dual quaternion. A two-stage algorithm is developed: A spectral initializer computed via the power method on a Hermitian dual quaternion measurement matrix, followed by a dual quaternion generalized power method (DQGPM) that enforces feasibility through per-iteration projection. The estimation error bounds are established for spectral estimators, and DQGPM is shown to admit a finite-iteration error bound and achieves linear error contraction up to an explicit noise-dependent threshold. Experiments on synthetic benchmarks and real-world multi-scan point-set registration demonstrate that the proposed pipeline improves both accuracy and efficiency over representative matrix-based methods.

2601.21243 2026-05-29 math.OC cs.LG cs.NA math.NA

Solving the Offline and Online Min-Max Problem of Non-smooth Submodular-Concave Functions: A Zeroth-Order Approach

求解非光滑子模-凹函数的离线和在线极小极大问题:一种零阶方法

Amir Ali Farzin, Yuen-Man Pun, Philipp Braun, Tyler Summers, Iman Shames

AI总结 针对目标函数关于最小化变量非光滑子模、关于最大化变量凹的极小极大问题,提出一种基于Lovász扩展次梯度和高斯平滑的零阶方法,证明离线情形下收敛到ε-鞍点,在线情形下达到O(√N P̄_N)对偶间隙。

详情
AI中文摘要

我们考虑目标函数可能非光滑、关于最小化变量子模且关于最大化变量凹的极大极小和极小极大问题。我们研究应用于该问题的零阶方法的性能。该方法基于关于最小化变量的目标函数Lovász扩展的次梯度,并利用高斯平滑来估计关于最大化变量的平滑函数梯度。在期望意义上,我们证明了算法在离线情形下收敛到ε-鞍点。此外,我们表明,在期望意义上,在线设定下算法实现了O(√N P̄_N)的在线对偶间隙,其中N是迭代次数,P̄_N是最优决策序列的路径长度。给出了所有情况下的复杂度分析和超参数选择。通过数值例子说明了理论结果。

英文摘要

We consider max-min and min-max problems with objective functions that are possibly non-smooth, submodular with respect to the minimiser and concave with respect to the maximiser. We investigate the performance of a zeroth-order method applied to this problem. The method is based on the subgradient of the Lovász extension of the objective function with respect to the minimiser and based on Gaussian smoothing to estimate the smoothed function gradient with respect to the maximiser. In expectation sense, we prove the convergence of the algorithm to an $ε$-saddle point in the offline case. Moreover, we show that, in the expectation sense, in the online setting, the algorithm achieves $O(\sqrt{N\bar{P}_N})$ online duality gap, where $N$ is the number of iterations and $\bar{P}_N$ is the path length of the sequence of optimal decisions. The complexity analysis and hyperparameter selection are presented for all the cases. The theoretical results are illustrated via numerical examples.

2601.17670 2026-05-29 cs.PL cs.AI

Grammar-Aware Literate Generative Mathematical Programming with Compiler-in-the-Loop

语法感知的 literate 生成式数学编程与编译器在环

Roberto Rossi, Steven D. Prestwich

AI总结 提出 SyntAGM 系统,通过迭代生成-编译-评估-修正循环,利用编译器反馈和 LLM 对齐判断,生成可读的代数建模语言优化模型,实现成本与质量的更优权衡。

Comments 18 pages, 7 figures

详情
AI中文摘要

数学规划广泛应用于物流、能源和劳动力规划等多个领域,用于建模和解决工业优化问题,但其使用需要大量的领域专业知识。大型语言模型提供了一种将自然语言问题描述转化为优化模型的有前景的方法,但现有方法成本高昂,且通常生成用通用计算机代码(如 Python)编写的模型,难以检查、验证和重用。在这项工作中,我们引入了 SyntAGM,一个通过迭代生成-编译-评估-修正循环生成可读代数建模语言优化模型的系统。SyntAGM 利用 PyOPL,一个类似 OPL 的建模语言编译器,旨在为迭代模型修复提供可操作的反馈。为了获得与问题描述匹配的有效 PyOPL 模型,SyntAGM 调动编译器反馈和基于 LLM 的对齐判断。此外,它结合了目标语言语法的上下文暴露和建模示例的少样本检索。在多个基准测试中,与既定的提示基线相比,SyntAGM 实现了更有利的成本-质量权衡。

英文摘要

Mathematical programming is widely employed across various sectors - such as logistics, energy, and workforce planning - to model and solve industrial optimisation problems, but its use requires substantial domain expertise. Large language models offer a promising way to translate natural-language problem descriptions into optimisation models, yet existing approaches are costly and generally produce models written in general-purpose computer code (e.g. Python), which can be difficult to inspect, validate, and reuse. In this work, we introduce SyntAGM, a system that generates optimisation models in a readable algebraic modelling language through an iterative generate-compile-assess-revise loop. SyntAGM leverages PyOPL, an OPL-like modelling language compiler designed to provide actionable feedback for iterative model repair. To obtain a valid PyOPL model that matches the problem description, SyntAGM mobilises compiler feedback and an LLM-based alignment judge. In addition, it combines in-context exposure to the target language grammar, and few-shot retrieval of modelling exemplars. Across multiple benchmarks, SyntAGM achieves a more favourable cost-quality trade-off compared to established prompting baselines.

2512.10388 2026-05-29 cs.IR cs.AI

The Best of the Two Worlds: Harmonizing Semantic and Hash IDs for Sequential Recommendation

两全其美:为序列推荐协调语义ID和哈希ID

Ziwei Liu, Yejing Wang, Wanyu Wang, Wang Zejian, Qidong Liu, Zijian Zhang, Chong Chen, Wei Huang, Xiangyu Zhao

AI总结 针对序列推荐中头部和尾部物品性能权衡问题,提出H2Rec框架,通过双分支架构协调语义ID和哈希ID,并采用双级对齐策略实现知识迁移,在公开基准和商业平台上取得更好平衡。

详情
AI中文摘要

传统的序列推荐系统通常分配唯一的哈希ID(HID)来构建物品嵌入,主要从历史用户-物品交互中捕获协同信号。然而,在大多数物品很少被消费的长尾场景中,这种嵌入是脆弱的。最近结合辅助信息的方法常常面临来自共现信号的噪声协同共享或由平坦密集嵌入导致的语义同质性问题。相比之下,语义ID(SID)因其支持代码共享和多粒度语义建模,提供了一种有前景的替代方案。然而,基于SID的方法受到协同压倒现象的阻碍:常用的量化机制损害了建模头部物品所需的标识符唯一性,导致头部和尾部物品之间的性能权衡。为了解决这一挑战,我们提出了H2Rec,一种协调SID和HID的新框架。我们设计了一个双分支建模架构,同时捕获SID的多粒度语义,同时保留HID提供的唯一协同身份。此外,我们引入了一种双级对齐策略来桥接两种表示,实现有效的知识迁移和鲁棒的偏好建模。在三个公开基准上的大量离线实验和在大规模商业平台上的在线实验表明,H2Rec在头部和尾部推荐质量之间实现了更好的平衡,并且持续优于现有基线。

英文摘要

Conventional Sequential Recommender Systems (SRS) typically assign unique hash IDs (HID) to construct item embeddings, which mainly capture collaborative signals from historical user-item interactions. However, such embeddings are vulnerable in long-tail scenarios where most items are rarely consumed. Recent methods that incorporate auxiliary information often face noisy collaborative sharing from co-occurrence signals or semantic homogeneity caused by flat dense embeddings. In contrast, Semantic IDs (SID), with their support for code sharing and multi-granular semantic modeling, offer a promising alternative. Nevertheless, SID-based methods are hindered by a collaborative overwhelming phenomenon: commonly adopted quantization mechanisms compromise the identifier uniqueness needed to model head items, resulting in a performance trade-off between head and tail items. To address this challenge, we propose H2Rec, a novel framework that harmonizes SID and HID. We design a dual-branch modeling architecture that simultaneously captures the multi-granular semantics of SID while preserving the unique collaborative identity provided by HID. Moreover, we introduce a dual-level alignment strategy to bridge the two representations, enabling effective knowledge transfer and robust preference modeling. Extensive offline experiments on three public benchmarks and online experiments on a large-scale commercial platform demonstrate that H2Rec achieves a better balance between head and tail recommendation quality and consistently outperforms existing baselines.

2512.01863 2026-05-29 cond-mat.mes-hall cond-mat.str-el cs.AI

Topological Order in Neural Wavefunctions

神经波函数中的拓扑序

Ahmed Abouelkomsan, Max Geier, Liang Fu

AI总结 本文利用基于注意力的深度神经网络变分波函数,通过能量最小化发现分数量子霍尔效应基态,并引入一种从单一实空间波函数提取拓扑简并度的方法,展示了神经网络变分蒙特卡洛在强关联拓扑相研究中的潜力。

Comments Published version

详情
Journal ref
Phys. Rev. B 113, 205119 (2026)
AI中文摘要

拓扑有序态是最有趣的量子物质相之一,它们承载具有分数电荷并服从分数量子统计的涌现准粒子。然而,由于这些态具有强耦合性质,传统的平均场处理难以奏效,因此其理论研究颇具挑战。在这里,我们证明基于注意力的深度神经网络提供了一个富有表现力的变分波函数,它仅通过能量最小化就能在无先验知识的情况下发现分数量子陈绝缘体基态,并达到了显著的精度。我们引入了一种高效的方法,通过将平移不变系统中的单一优化实空间波函数分解为不同的多体动量扇区,从中提取基态拓扑简并度——这是拓扑序的标志。我们的结果确立了神经网络变分蒙特卡洛作为发现强关联拓扑相的多功能工具的地位。

英文摘要

Topologically ordered states are among the most interesting quantum phases of matter that host emergent quasi-particles having fractional charge and obeying fractional quantum statistics. Theoretical study of such states is however challenging owing to their strong-coupling nature that prevents conventional mean-field treatment. Here, we demonstrate that an attention-based deep neural network provides an expressive variational wavefunction that discovers fractional Chern insulator ground states purely through energy minimization without prior knowledge and achieves remarkable accuracy. We introduce an efficient method to extract ground state topological degeneracy -- a hallmark of topological order -- from a single optimized real-space wavefunction in translation-invariant systems by decomposing it into different many-body momentum sectors. Our results establish neural network variational Monte Carlo as a versatile tool for discovering strongly correlated topological phases.

2511.16815 2026-05-29 stat.ML cs.LG

BITS for GAPS: Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates

BITS for GAPS:用于层次高斯过程代理的贝叶斯信息论采样

Kyla D. Jones, Alexander W. Dowling

AI总结 提出BITS for GAPS框架,通过贝叶斯层次建模将超参数不确定性传播到采样准则中,实现基于高斯过程代理模型的信息论实验设计,并在汽液平衡案例中验证其提升预测精度和信息增益的效果。

详情
Journal ref
Computers & Chemical Engineering, 197, 109041 (2026)
AI中文摘要

我们引入了用于层次高斯过程代理的贝叶斯信息论采样(BITS for GAPS),这是一个框架,能够实现基于高斯过程的代理模型的信息论实验设计。与标准方法(在采集函数中使用固定或点估计的超参数)不同,我们的方法通过贝叶斯层次建模将超参数不确定性传播到采样准则中。在该框架中,潜在函数接受高斯过程先验,而超参数被赋予额外的先验以捕捉建模者对控制物理现象的知识。因此,采集函数同时包含了来自潜在函数及其超参数的不确定性,确保采样由数据稀缺性和模型不确定性共同指导。我们进一步在此背景下建立了理论结果:后验微分熵的闭式近似和下界。我们通过一个汽液平衡案例研究展示了该框架在混合建模中的实用性。具体来说,我们为二元混合物中的潜在活度系数构建了一个代理模型。通过将代理嵌入扩展形式的拉乌尔定律中,我们构建了一个混合模型。该混合模型随后用于指导蒸馏设计。该案例研究展示了如何将部分物理知识转化为层次高斯过程代理。它还表明,使用BITS for GAPS通过瞄准Wilson活度模型的高不确定性区域,增加了期望信息增益和预测准确性。总体而言,BITS for GAPS是一个用于复杂物理系统中自适应数据采集的通用不确定性感知框架。

英文摘要

We introduce Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates (BITS for GAPS), a framework enabling information-theoretic experimental design of Gaussian process-based surrogate models. Unlike standard methods, which use fixed or point-estimated hyperparameters in acquisition functions, our approach propagates hyperparameter uncertainty into the sampling criterion through Bayesian hierarchical modeling. In this framework, a latent function receives a Gaussian process prior, while hyperparameters are assigned additional priors to capture the modeler's knowledge of the governing physical phenomena. Consequently, the acquisition function incorporates uncertainties from both the latent function and its hyperparameters, ensuring that sampling is guided by both data scarcity and model uncertainty. We further establish theoretical results in this context: a closed-form approximation and a lower bound of the posterior differential entropy. We demonstrate the framework's utility for hybrid modeling with a vapor-liquid equilibrium case study. Specifically, we build a surrogate model for latent activity coefficients in a binary mixture. We construct a hybrid model by embedding the surrogate into an extended form of Raoult's law. This hybrid model then informs distillation design. This case study shows how partial physical knowledge can be translated into a hierarchical Gaussian process surrogate. It also shows that using BITS for GAPS increases expected information gain and predictive accuracy by targeting high-uncertainty regions of the Wilson activity model. Overall, BITS for GAPS is a generalized uncertainty-aware framework for adaptive data acquisition in complex physical systems.