arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2069
2603.14324 2026-06-01 stat.ML cs.LG

Learning-to-Defer with Expert-Conditional Advice

基于专家条件建议的学习-延迟决策

Yannis Montreuil, Leïna Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

AI总结 研究在决策时可为专家提供额外信息(建议)的延迟学习问题,提出一种在复合专家-建议动作空间上的增广替代损失,并证明其一致性保证和最优策略恢复能力。

详情
AI中文摘要

学习-延迟决策将每个输入路由到预期成本最小的专家,但假设决策时每个专家可获得的信息是固定的。许多现代系统违反了这一假设:选择专家后,还可以选择该专家应接收哪些额外信息,例如检索到的文档、工具输出或升级上下文。我们研究了这个问题,并将其称为带建议的学习-延迟决策。我们表明,即使在最简单的非平凡设置中,一系列广泛使用的自然分离替代损失(通过不同头部学习路由和建议)也是不一致的。然后,我们引入了一个在复合专家-建议动作空间上操作的增广替代损失,并证明了其$\mathcal{H}$一致性保证以及超额风险转移界,从而在极限情况下恢复贝叶斯最优策略。在表格、语言和多模态任务上的实验表明,所提方法优于标准学习-延迟决策,同时根据成本机制调整其建议获取行为;一个合成基准证实了分离替代损失预测的失败模式。

英文摘要

Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.

2510.10988 2026-06-01 stat.ML cs.LG

Adversarial Robustness in One-Stage Learning-to-Defer

单阶段学习委托中的对抗鲁棒性

Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

AI总结 针对单阶段学习委托(L2D)中预测器与分配器联合训练的场景,提出首个对抗鲁棒性框架,通过形式化攻击、设计成本敏感的对抗替代损失并建立理论保证(包括H、R/F和贝叶斯一致性),在基准数据集上验证了方法在保持干净性能的同时提升了对无目标和有目标攻击的鲁棒性。

详情
AI中文摘要

学习委托(L2D)通过将输入路由到预测器或外部专家来实现混合决策。尽管前景广阔,但L2D极易受到对抗扰动的影响,这些扰动不仅可能翻转预测,还可能操纵委托决策。先前的鲁棒性分析仅关注两阶段设置,未涉及预测器和分配器联合训练的端到端(单阶段)情况。我们首次提出了单阶段L2D中对抗鲁棒性的框架,涵盖分类和回归。我们的方法形式化了攻击,提出了成本敏感的对抗替代损失,并建立了包括$\mathcal{H}$、$(\mathcal{R}, \mathcal{F})$和贝叶斯一致性在内的理论保证。在基准数据集上的实验证实,我们的方法在保持干净性能的同时,提高了对无目标和有目标攻击的鲁棒性。

英文摘要

Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions. Prior robustness analyses focus solely on two-stage settings, leaving open the end-to-end (one-stage) case where predictor and allocation are trained jointly. We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression. Our approach formalizes attacks, proposes cost-sensitive adversarial surrogate losses, and establishes theoretical guarantees including $\mathcal{H}$, $(\mathcal{R }, \mathcal{F})$, and Bayes consistency. Experiments on benchmark datasets confirm that our methods improve robustness against untargeted and targeted attacks while preserving clean performance.

2605.29511 2026-06-01 cs.MA cs.CL cs.LG

DynaGraph: Lightweight Multi-Model Interaction Framework via Dynamic Topological Reconfiguration

DynaGraph: 通过动态拓扑重构的轻量级多模型交互框架

Yanxing Guo, Zihao Zheng, Fangzhou Wu, Ling Liang, Lin Bao, Zongwei Wang, Yimao Cai

AI总结 提出DynaGraph框架,通过动态拓扑重构和PEFT适配器复用,在单消费级GPU上实现多模型协作,接近72B单模型推理能力并大幅降低延迟和token消耗。

详情
AI中文摘要

处理复杂推理任务通常依赖于庞大的单体LLM,这会导致严重的计算冗余。虽然通过结构化流水线或多智能体协作进行任务分解提供了替代方案,但这些方法不可避免地陷入一个关键困境:预定义的静态拓扑极易受到级联错误的影响,而无约束的动态智能体则面临轨迹发散和不可预测的内存膨胀。为了解决这个问题,我们提出了DynaGraph,一个由动态拓扑重构驱动的轻量级多模型框架。在执行层面,DynaGraph在共享基础模型上复用时分PEFT适配器,使得整个系统的训练和推理部署可以在单个消费级GPU上完成。在路由层面,评估器持续监控执行置信度以触发分层自愈:针对局部数据差距的细粒度修补和针对严重逻辑断裂的子图重构。在StrategyQA、MATH和FinQA上的实验表明,我们的8B模型接近72B单体模型的推理能力(例如,在StrategyQA上为87.6%,在MATH上为82.7%)。此外,与无约束的动态架构相比,它延迟降低了高达68.1%,token消耗降低了68.6%。

英文摘要

Tackling complex reasoning tasks typically relies on massive monolithic LLMs, which suffer from severe computational redundancy. While task decomposition through structured pipelines or multi-agent collaborations offers an alternative, these approaches inevitably fall into a critical dilemma: predefined static topologies are highly vulnerable to cascading errors, whereas unconstrained dynamic agents suffer from trajectory divergence and unpredictable memory bloat. To address this, we present DynaGraph, a lightweight multi-model framework driven by dynamic topological reconfiguration. At the execution level, DynaGraph multiplexes time-division PEFT adapters over a shared base model, enabling both full system training and inference deployment on a single consumer-grade GPU. At the routing level, the Evaluator continuously monitors execution confidence to trigger hierarchical self-healing: Fine-grained Patching for localized data gaps and Subgraph Reconstruction for severe logical ruptures. Experiments on StrategyQA, MATH, and FinQA demonstrate our 8B model closely approximates the reasoning capabilities of a 72B monolithic model (e.g., 87.6% on StrategyQA, 82.7% on MATH). Furthermore, it reduces latency by up to 68.1% and token consumption by 68.6% compared to unconstrained dynamic architectures.

2605.28916 2026-06-01 astro-ph.IM cs.AI cs.HC

First head-to-head comparison of agentic AI applied to the analysis of simulated data of the Einstein Telescope

应用于爱因斯坦望远镜模拟数据分析的智能体AI首次头对头比较

Gianluca Inguglia

AI总结 本文首次直接比较了Claude Code和Codex两种智能体AI系统在无人干预下自主执行引力波数据分析管线的行为、科学结果和计算成本,揭示了速度与可审计性、指令解释差异等关键问题。

Comments Version 2; includes the report autonomoulsy written in PRD style by agentic AI systems as supplemental material

详情
AI中文摘要

我们报告了两种最先进的智能体AI系统——Claude Code (Anthropic) 和 Codex (OpenAI) 的比较,它们被要求在共享计算基础设施上无人干预地自主执行一个简单的端到端引力波数据分析管线。该管线包括:从爱因斯坦望远镜模拟噪声中估计功率谱密度、生成几何模板库、对100个双黑洞信号注入进行匹配滤波恢复、自动生成结果,以及在大语言模型辅助下制作以Physical Review D格式排版的手稿。两个智能体均收到相同的书面规范和相同的计算资源。实验进行了两次:第一次使用不切实际的高信噪比注入,第二次将信号重新缩放到物理合理的信噪比范围。两次实验的科学结果均收敛。然而,智能体表现出截然不同的行为和计算成本:Claude Code在约3.4分钟内完成管线,但存在对规范的无声偏差;而Codex需要约16分钟,经历了明确的自我纠正重启,包括对匹配滤波内循环进行未经请求的性能优化。自主生成的手稿在长度、细节和质量上也存在差异。在第二次实验中,对信噪比范围指令解释的细微差异导致了真正的科学分歧:Claude Code无声地重新解释了指令,而Codex严格遵循了规范。我们讨论了这些行为差异(例如速度与可审计性、无声与透明的错误处理、指令解释以及多模型管线中中间数据表示的关键性)对智能体AI在科学计算工作流中部署的影响。

英文摘要

We report a comparison of two state-of-the-art agentic AI systems, Claude Code (Anthropic) and Codex (OpenAI), tasked with autonomously executing a simple end-to-end gravitational wave data analysis pipeline on a shared computing infrastructure without human intervention. The pipeline comprises power spectral density estimation from raw Einstein Telescope simulated noise, geometric template bank generation, matched filter recovery of 100 binary black hole signal injections, automated results generation, and large language model-assisted production of a manuscript formatted in the style of Physical Review D. Both agents received identical written specifications and identical compute resources. The experiment was run twice: a first run with unrealistically loud injections, and a second run with signals rescaled to a physically motivated SNR range. The scientific results converged in both runs. However, the agents exhibited substantially different behaviors and computational costs: Claude Code completed the pipeline in ~3.4 minutes with silent deviations from the specification, while Codex required ~16 minutes across explicit self-correcting restarts, including an unsolicited performance optimization of the matched filter inner loop. The autonomously generated manuscripts also diverged in length, details, and quality. In the second run, a subtle difference in the interpretation of the SNR range instruction led to a genuine scientific divergence: Claude Code silently reinterpreted the instructions, while Codex followed the specification literally. We discuss the implications of these behavioral differences, such as speed versus auditability, silent versus transparent error handling, instruction interpretation, and the criticality of intermediate data representations in multi-model pipelines, for the deployment of agentic AI in scientific computing workflows.

2603.27052 2026-06-01 cs.CY cs.AI

Multi-Level Barriers to Generative AI Adoption Across Disciplines and Professional Roles in Higher Education

高等教育中跨学科与职业角色采用生成式AI的多层次障碍

Jianhua Yang, Kerem Öge, Adrian von Mühlenen, Abdullah Bilal Akbulut, Tanya Suzanne Carey, Chidi Okorro

AI总结 通过对一所罗素集团大学272名学术与专业服务人员的多方法调查分析,揭示了非STEM学术人员主要报告与学术诚信相关的伦理文化障碍,而STEM和专业服务人员则强调制度、治理和基础设施约束,表明GenAI采用障碍深嵌于组织生态系统和认知规范中。

Comments 21 pages, 3 figures, 6 tables

详情
Journal ref
Educ. Sci. 2026, 16(6), 838;
AI中文摘要

生成式人工智能(GenAI)正在迅速重塑高等教育,但不同学科和机构角色间采用GenAI的障碍仍未得到充分探索。现有文献常将采用障碍归因于个体层面的因素,如感知有用性和易用性。本研究转而调查这些障碍是否由结构产生。通过对一所罗素集团大学的272名学术和专业服务人员进行多方法调查分析,我们考察了学科背景和机构角色如何塑造感知障碍。通过整合多项逻辑回归(MLR)、结构方程模型(SEM)和开放式回答的语义聚类,我们超越了描述性叙述,提供了GenAI采用的多层次解释。我们的发现揭示了清晰、系统的差异:非STEM学术人员主要报告与学术诚信相关的伦理和文化障碍,而STEM和专业服务人员则不成比例地强调制度、治理和基础设施约束。我们得出结论,GenAI采用障碍深嵌于组织生态系统和认知规范中,表明大学必须超越通用培训,开发针对特定角色的治理和支持框架。

英文摘要

Generative Artificial Intelligence (GenAI) is rapidly reshaping higher education, yet barriers to its adoption across different disciplines and institutional roles remain underexplored. Existing literature frequently attributes adoption barriers to individual-level factors such as perceived usefulness and ease of use. This study instead investigates whether such barriers are structurally produced. Drawing on a multi-method survey analysis of 272 academic and professional services (PSs) staff at a Russell Group university, we examine how disciplinary contexts and institutional roles shape perceived barriers. By integrating multinomial logistic regression (MLR), structural equation modelling (SEM), and semantic clustering of open-ended responses, we move beyond descriptive accounts to provide a multi-level explanation of GenAI adoption. Our findings reveal clear, systematic differences: non-STEM academics primarily report ethical and cultural barriers related to academic integrity, whereas STEM and PSs staff disproportionately emphasize institutional, governance, and infrastructure constraints. We conclude that GenAI adoption barriers are deeply embedded in organizational ecosystems and epistemic norms, suggesting that universities must move beyond generalized training to develop role-specific governance and support frameworks.

2602.20176 2026-06-01 q-bio.BM cs.LG

Cross-Chirality Generalization by Axial Vectors for Hetero-Chiral Protein-Peptide Interaction Design

通过轴向向量实现异手性蛋白质-肽相互作用设计的跨手性泛化

Ziyi Yang, Zitong Tian, Yinjun Jia, Tianyi Zhang, Jiqing Zheng, Hao Wang, Yubu Su, Juncai He, Lei Liu, Yanyan Lan

AI总结 提出向E(3)等变(极)向量特征注入轴向特征的方法,结合潜在扩散模型实现从同手性训练数据到异手性设计任务的跨手性泛化,首次通过湿实验验证了生成式AI从头设计D-肽结合物的有效性。

Comments v3: Revised acknowledgements only. The paper has been accepted to ICML 2026

详情
AI中文摘要

靶向L-蛋白的D-肽结合物具有广阔的治疗潜力。尽管基于机器学习的靶标条件肽设计取得了快速进展,但生成D-肽结合物仍基本未被探索。在这项工作中,我们表明通过向$E(3)$等变(极)向量特征注入轴向特征,可以实现从同手性(L--L)训练数据到异手性(D--L)设计任务的跨手性泛化。通过在潜在扩散模型中实现该方法,我们实现了D-肽结合物设计,不仅在 extit{in silico}基准测试中优于现有工具,而且在湿实验验证中显示出有效性。据我们所知,我们的方法代表了首个经过湿实验验证的用于 extit{de novo}设计D-肽结合物的生成式AI,为处理蛋白质设计中的手性提供了新视角。代码可在https://github.com/YZY010418/PepMirror获取。

英文摘要

D-peptide binders targeting L-proteins have promising therapeutic potential. Despite rapid advances in machine learning-based target-conditioned peptide design, generating D-peptide binders remains largely unexplored. In this work, we show that by injecting axial features to $E(3)$-equivariant (polar) vector features, it is feasible to achieve cross-chirality generalization from homo-chiral (L--L) training data to hetero-chiral (D--L) design tasks. By implementing this method within a latent diffusion model, we achieved D-peptide binder design that not only outperforms existing tools in \textit{in silico} benchmarks, but also demonstrates efficacy in wet-lab validation. To our knowledge, our approach represents the first wet-lab validated generative AI for the \textit{de novo} design of D-peptide binders, offering new perspectives on handling chirality in protein design. Codes are available at https://github.com/YZY010418/PepMirror

2510.15340 2026-06-01 quant-ph cs.LG cs.SY eess.SY

Singularity-free dynamical invariants-based quantum control

基于无奇点动力学不变量量子控制

Ritik Sareen, Akram Youssry, Alberto Peruzzo

AI总结 针对非马尔可夫开放量子系统中的态制备问题,提出一种广义不变量协议,通过将有限维控制问题转化为单量子比特问题,构建有界脉冲族并优化选择以抑制噪声,实现高保真度且硬件可行的控制。

详情
AI中文摘要

态制备是量子技术的基石,支撑着计算、通信和传感等应用。在非马尔可夫开放量子系统中,其重要性更加凸显,因为环境记忆和模型不确定性对实现高保真度控制构成了重大挑战。基于不变量的逆向工程为合成解析控制场提供了一个原则性框架,然而现有的参数化常常导致实验上不可行的奇异脉冲,并且仅限于简化噪声模型(如Lindblad形式)。本文针对任意噪声条件下的有限维态制备,引入了一种广义的不变量协议。通过将动力学限制在一个设计的SU(2)子空间内,我们将有限维控制问题转化为单量子比特的等效问题。控制协议分为两个阶段:首先,我们构造一族有界脉冲,在封闭系统中实现完美的态制备;其次,我们确定该族中的最优成员,以最小化噪声的影响。该框架同时适用于(i)已表征噪声,支持噪声感知控制合成,以及(ii)未表征噪声,其中一种与噪声无关的变体无需主方程描述即可保持鲁棒性。数值模拟表明,该方法能在多种目标上实现高保真度态制备,同时产生平滑、硬件可行的控制场。这一无奇点框架将基于不变量的控制扩展到现实的开放系统区域,为在NISQ硬件及其他表现出非马尔可夫动力学的平台上实现鲁棒的量子态工程提供了一条通用途径。

英文摘要

State preparation is a cornerstone of quantum technologies, underpinning applications in computation, communication, and sensing. Its importance becomes even more pronounced in non-Markovian open quantum systems, where environmental memory and model uncertainties pose significant challenges to achieving high-fidelity control. Invariant-based inverse engineering provides a principled framework for synthesizing analytic control fields, yet existing parameterizations often lead to experimentally infeasible, singular pulses and are limited to simplified noise models such as those of Lindblad form. Here, we introduce a generalized invariant-based protocol for finite-dimensional state preparation under arbitrary noise conditions. We transform the finite-dimensional control problem into the equivalent problem for a single-qubit, by restricting the dynamics to a designed SU(2) subspace. The control protocol then proceeds in two-stages: first, we construct a family of bounded pulses that achieve perfect state preparation in a closed system; second, we identify the optimal member of this family that minimizes the effect of noise. The framework accommodates both (i) characterized noise, enabling noise-aware control synthesis, and (ii) uncharacterized noise, where a noise-agnostic variant preserves robustness without requiring a master-equation description. Numerical simulations demonstrate high-fidelity state preparation across diverse targets while producing smooth, hardware-feasible control fields. This singularity-free framework extends invariant-based control to realistic open-system regimes, providing a versatile route toward robust quantum state engineering on NISQ hardware and other platforms exhibiting non-Markovian dynamics.

2605.25773 2026-06-01 stat.ML cs.AI cs.CL cs.LG

Efficient Benchmarking Is Just Feature Selection and Multiple Regression

高效基准测试仅是特征选择与多元回归

Sam Bowyer, Acyr Locatelli, Kris Cao

AI总结 将高效基准测试重新定义为带特征选择的多元回归问题,使用核岭回归预测和mRMR特征选择算法,在降低计算成本的同时提高预测精度和排名相关性。

Comments 36 pages, 27 figures

详情
AI中文摘要

高效基准测试技术旨在通过仅使用基准测试问题子集预测完整基准测试分数,从而降低评估LLMs的计算成本。通过将此问题重新定义为带特征选择的多元回归实例,我们发现只需在预测阶段使用核岭回归即可大幅改进现有高效基准测试方法。此外,使用一种名为最小冗余最大相关性(mRMR)的信息论特征选择算法,我们可以通过选择对预测最有用的问题子集进一步改进这些方法。除数据非常匮乏的情况外,这些方法在二元和连续指标的各种基准测试中,始终实现更小的预测误差(MAE和RMSE),以及预测分数与真实分数之间更大的排名相关性(Spearman ρ和Kendall τ)。此外,mRMR子采样比竞争方法(通常涉及拟合概率模型或运行聚类算法)快得多,并且在不同随机种子或训练数据划分下更可能选择相同的问题。教程代码见https://github.com/sambowyer/mrmr_eval。

英文摘要

Efficient benchmarking techniques aim to lower the computational cost of evaluating LLMs by predicting full benchmark scores using only a subset of a benchmark's questions. By reframing this problem as an instance of multiple regression with feature selection, we find that existing efficient benchmarking methods can be greatly improved by simply using kernel ridge regression at the prediction stage. Additionally, using an information-theoretic feature-selection algorithm called minimum redundancy maximum relevance (mRMR), we can further improve upon these methods by selecting question subsets that will be maximally useful for prediction. Except in very data-poor settings, these approaches consistently achieve smaller prediction errors (in both MAE and RMSE), and greater ranking correlation between predicted and true scores (in both Spearman $ρ$ and Kendall $τ$) across a range of benchmarks using both binary and continuous metrics. Furthermore, mRMR subsampling is much faster than competitor methods (which often involve fitting probabilistic models or running clustering algorithms), and is more likely to select the same questions under different random seeds or training data splits. Tutorial code can be found at https://github.com/sambowyer/mrmr_eval .

2605.24863 2026-06-01 eess.AS cs.SD

Rethinking Continual Learning for Speech and Audio: A Representation-Centric Taxonomy and Open Problems

重新思考语音和音频的持续学习:基于表征的分类法与开放问题

Yang Xiao, Siyi Wang, Eun-Jung Holden, Ting Dang

AI总结 本文从表征中心视角重新审视语音持续学习,提出基于表征几何演化的新分类法,并指出现有假设与语音基础模型行为的关键不匹配,最后概述开放挑战与未来方向。

Comments 4 pages, 1 figure, working in process

详情
AI中文摘要

语音和音频系统运行在本质非平稳的环境中,然而该领域的持续学习(CL)研究,尤其是在基础模型时代,仍然零散,未能考虑声学表征的耦合性和几何敏感性。现代语音基础模型在高度纠缠的连续表征上操作,这些表征在共享的潜在空间中联合编码语言、说话人和副语言因素。因此,CL从根本上关乎保留和演化共享的表征结构,而非保留孤立的任务知识。在这项工作中,我们从表征中心视角重新审视语音的CL,并引入一种新的分类法,根据底层表征几何在非平稳声学条件下的演化方式来组织CL。我们进一步指出现有CL假设与语音基础模型行为之间的关键不匹配,最后概述一系列开放挑战和未来研究方向。

英文摘要

Speech and audio systems operate in inherently non-stationary environments, yet continual learning (CL) research in this domain, especially in the foundation model era, remains fragmented that fail to account for the coupled, geometry-sensitive nature of acoustic representations. Modern speech foundation models operate over highly entangled, continuous representations that jointly encode linguistic, speaker, and paralinguistic factors within a shared latent space. CL is therefore fundamentally about preserving and evolving shared representation structure rather than retaining isolated task knowledge. In this work, we revisit CL for speech from a representation-centered perspective, and introduce a new taxonomy that organizes CL according to how underlying representation geometry evolves under non-stationary acoustic conditions. We further identify key mismatches between current CL assumptions and speech foundation model behavior, and finally outline a set of open challenges and future research directions.

2605.24535 2026-06-01 cs.CR cs.LG

Steering Beyond the Support: Adversarial Training on Unsupervised Jailbroken Activation Simulation

超越支持域:无监督越狱激活模拟的对抗训练

Luoyu Chen, Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Feng Wu, Jianhuan Huang, Ahmed Asiri, Shui Yu

AI总结 针对现有安全引导方法对未见越狱攻击泛化失败的问题,提出基于无监督潜在方向发现的双层对抗训练框架,通过外推拒绝态有害请求激活模拟多样越狱激活,并训练势诱导引导场实现零样本越狱防御。

Comments accepted by ICML 2026

详情
AI中文摘要

越狱提示可以触发对齐LLM的有害完成,相应地,安全引导被提出:在测试时进行激活干预,引导越狱激活触发拒绝,同时保持良性效用。然而,现有的引导方法本质上是监督的,并且依赖于静态、有限的训练集,而真实的越狱是不断演化的,并且通常与训练集分布外,导致对未见攻击的失败。在本文中,我们解决了未见越狱的失败问题,基于无监督潜在方向发现。我们提出了一个双层对抗训练框架用于零样本越狱防御。在内部步骤中,我们通过无监督潜在方向发现从拒绝态有害请求激活外推,模拟多样的越狱激活,从而扩展真实越狱激活子空间的覆盖范围。在外部步骤中,我们训练一个势诱导引导场,将这些对抗性越狱状态推入拒绝区域,同时保持良性不变。在三个LLM和六个经典越狱家族上,我们的方法实现了强防御,攻击成功率大多低于5%,并且训练过程中子空间覆盖率的上升有助于解释改进的泛化性。

英文摘要

Jailbreak prompts can trigger harmful completions on aligned LLMs, In accordance, safety steering has been proposed: test-time activation interventions that steer jailbreak activations to trigger refusal while preserving benign utility. However, existing steering methods are fundamentally supervised and tied to a static, limited training set, whereas real jailbreaks evolve and are often out-of-distributed from the training set, leading to failures on unseen attacks. In this paper, we tackle the failure on unseen jailbreaks problem, base on unsupervised latent direction discovery. We propose a bi-level adversarial training framework for zero-shot jailbreak defense. In the inner step, we simulate diverse jail-broken activations by extrapolating from refusal-state harmful-request activations via unsupervised latent direction discovery, which expands the coverage of real jailbreak activation subspaces. In the outer step, we train a potential-induced steering field to push these adversarial jailbroken states into refusal regions while keeping benign unchanged. Across three LLMs and six classical jailbreak families, our method achieves strong defense with attack success rates mostly below 5%, and rising subspace coverage throughout training helps explain the improved generalization.

2501.02672 2026-06-01 stat.ML cs.LG econ.EM stat.ME

Re-examining Granger Causality with Causal Bayesian Networks and Reichenbachs Principles

重新审视格兰杰因果关系:基于因果贝叶斯网络和赖兴巴赫原理

S. A. Adedayo

AI总结 本文通过赖兴巴赫原理和因果贝叶斯网络重新解释格兰杰因果关系,提出因果化格兰杰因果关系(c-GC)算法,赋予其稳健的因果解释,并在合成数据上取得满意结果。

详情
AI中文摘要

表征复杂系统中的因果关系是理解其潜在机制的基础。格兰杰因果关系(GC)仍然是识别时间序列数据中因果关系的广泛使用的计算工具。然而,与其他因果发现方法一样,GC存在局限性,并因缺乏严格的因果基础而受到批评。在这项工作中,我们通过赖兴巴赫原理和因果贝叶斯网络的视角重新解释GC,从而解决了这一批评。这种重新解释被实现为一种算法,我们称之为因果化格兰杰因果关系(c-GC)。我们在理论上和图形上证明,这种重新表述在特定假设下赋予GC稳健的因果解释。c-GC在合成数据上取得了令人满意的结果,为观测数据集中的因果发现提供了一个更有原则的框架。

英文摘要

Characterising cause-effect relationships in complex systems is fundamental to understanding their underlying mechanisms. Granger causality (GC) remains a widely used computational tool for identifying causal relationships in time series data. However, like other causal discovery methods, GC has limitations and has been criticised for lacking a rigorous causal foundation. In this work, we present a fix to this criticism by reinterpreting GC through the lenses of Reichenbach's principles and causal Bayesian networks. This reinterpretation was implemented as an algorithm we call causalized Granger causality (c-GC). We demonstrate, both theoretically and graphically, that this reformulation endows GC with a robust causal interpretation under specific assumptions. c-GC yields satisfactory results on synthetic data, offering a more principled framework for causal discovery in observational datasets.

2605.19233 2026-06-01 cs.CR cs.LG quant-ph

Quantum Machine Learning for Cyber-Physical Anomaly Detection in Unmanned Aerial Vehicles: A Leakage-Free Evaluation with Proxy-Audited Feature Sets

量子机器学习在无人机网络物理异常检测中的应用:基于代理审计特征集的无泄漏评估

Carlos A. Durán Paredes, Javier E. León Calderón, Nicolás Sánchez Perea, Germán Darío Díaz, Camilo Segura Quintero

AI总结 针对无人机网络物理攻击,提出无泄漏评估框架,结合分组时间协议、三模式特征审计和混合XGBoost+数据重上传分类器,验证量子增强混合方法的增量优势。

Comments 10 pages, 7 figures, 1 table; open Qiskit 2.x implementation available at https://github.com/Carlosandp/TLM-UAV-Quantum-Anomaly-Detection

详情
AI中文摘要

无人机是网络物理系统,其攻击面涵盖网络化航空电子设备和机载传感器融合:受损的GPS或电池模块可以模拟良性任务段并逃避简单的异常检测器。我们在多传感器TLM:UAV基准上对无人机异常检测的量子机器学习进行了无泄漏评估。三项贡献支持该研究。(i) 一种分组感知时间协议(B2)将数据集划分为十个连续的TimeUS块,并在十个随机种子上进行评估,消除了随机分层分割混合邻近样本所产生的膨胀。(ii) 一种三模式特征审计(完整/宽松/严格)量化了准确度有多少来自瞬时物理信号与上下文代理(累积能量、电池状态、GPS轨迹)。(iii) 在相同预算下,将混合XGBoost+数据重上传(DRU)分类器与五个配对的非线性控制(原始、PCA、多项式-2、随机RBF和未训练的DRU映射)进行基准测试。独立DRU在种子间并不始终匹配最强的经典基线;然而,经过训练的DRU混合模型是唯一一个平均F1宏从完整模式到严格模式向上移动(+0.05)的模型,这一方向性信号由于种子间标准差而无法解释为统计上确定的差异。经过训练的DRU混合模型在无代理评估下还记录了最低的平均误报率,但受所报告的种子间方差影响。我们将其视为一种增量的、可复现的量子增强混合优势,并提供一个开源的Qiskit 2.x实现,作为NISQ时代航空航天系统中网络安全分析的基准。

英文摘要

Unmanned aerial vehicles (UAVs) are cyber-physical systems whose attack surface spans networked avionics and on-board sensor fusion: a compromised GPS or battery module can mimic a benign mission segment and evade naive anomaly detectors. We present a leakage-free evaluation of quantum machine learning for UAV anomaly detection on the multi-sensor TLM:UAV benchmark. Three contributions support the study. (i) A group-aware temporal protocol (B2) partitions the dataset into ten contiguous TimeUS blocks and evaluates over ten seeds, eliminating the inflation produced by random stratified splits that mix neighbouring samples. (ii) A three-mode feature audit (full/loose/strict) quantifies how much accuracy stems from instantaneous physical signals versus contextual proxies (cumulative energy, battery state, GPS trajectory). (iii) A hybrid XGBoost + Data Reuploading (DRU) classifier is benchmarked against five paired non-linear controls (raw, PCA, polynomial-2, random-RBF, and an untrained DRU map) under identical budgets. The standalone DRU does not consistently match the strongest classical baseline across seeds; however, the trained-DRU hybrid is the only model whose mean F1 macro shifts upward from full to strict (+0.05), a directional signal that the per-seed standard deviations prevent from being interpreted as a statistically established difference. The trained-DRU hybrid also records the lowest mean false-alarm rate under proxy-free evaluation, subject to the inter-seed variance reported. We frame this as an incremental, reproducible quantum-enhanced hybrid benefit, and provide an open Qiskit 2.x implementation as a benchmark for cybersecurity analytics in NISQ-era aerospace systems.

2602.03012 2026-06-01 cs.CR cs.AI

CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability

CVE-Factory:规模化专家级代码安全漏洞智能体任务

Xianzhen Luo, Jingyuan Zhang, Shiqi Zhou, Jinyang Huang, Chuan Xiao, Qingfu Zhu, Zhiyuan Ma, Xing Yue, Yang Yue, Wencong Zeng, Wanxiang Che

AI总结 提出CVE-Factory多智能体框架,自动将稀疏CVE元数据转化为可执行的智能体任务,构建持续更新的基准LiveCVEBench和训练数据集,微调模型性能显著提升。

Comments Accepted by ICML2026 Oral

详情
AI中文摘要

评估和改进代码智能体的安全能力需要高质量、可执行的漏洞任务。然而,现有工作依赖昂贵且不可扩展的人工复现,并受限于过时的数据分布。为解决这些问题,我们提出了CVE-Factory,这是首个实现专家级质量的多智能体框架,能够自动将稀疏的CVE元数据转化为完全可执行的智能体任务。与人类专家复现的交叉验证表明,CVE-Factory实现了95%的解决方案正确性和96%的环境保真度,证实了其专家级质量。该框架还在最新的真实漏洞上进行了评估,达到了66.2%的验证成功率。这种自动化带来了两个下游贡献。首先,我们构建了LiveCVEBench,一个持续更新的基准测试,包含190个任务,涵盖14种语言和153个仓库,捕获了包括AI工具漏洞在内的新兴威胁。其次,我们合成了超过1000个可执行的训练环境,这是代码安全领域智能体任务的首次大规模扩展。微调后的Qwen3-32B在LiveCVEBench上的性能从5.3%提升到35.8%,超过了Claude 4.5 Sonnet,并且这些提升泛化到了Terminal Bench(从12.5%到31.3%)。我们开源了CVE-Factory、LiveCVEBench、Abacus-cve(微调模型)、训练数据集和排行榜。所有资源可在https://github.com/livecvebench/CVE-Factory获取。

英文摘要

Evaluating and improving the security capabilities of code agents requires high-quality, executable vulnerability tasks. However, existing works rely on costly, unscalable manual reproduction and suffer from outdated data distributions. To address these, we present CVE-Factory, the first multi-agent framework to achieve expert-level quality in automatically transforming sparse CVE metadata into fully executable agentic tasks. Cross-validation against human expert reproductions shows that CVE-Factory achieves 95\% solution correctness and 96\% environment fidelity, confirming its expert-level quality. It is also evaluated on the latest realistic vulnerabilities and achieves a 66.2\% verified success. This automation enables two downstream contributions. First, we construct LiveCVEBench, a continuously updated benchmark of 190 tasks spanning 14 languages and 153 repositories that captures emerging threats including AI-tooling vulnerabilities. Second, we synthesize over 1,000 executable training environments, the first large-scale scaling of agentic tasks in code security. Fine-tuned Qwen3-32B improves from 5.3\% to 35.8\% on LiveCVEBench, surpassing Claude 4.5 Sonnet, with gains generalizing to Terminal Bench (12.5\% to 31.3\%). We open-source CVE-Factory, LiveCVEBench, Abacus-cve (fine-tuned model), training dataset, and leaderboard. All resources are available at https://github.com/livecvebench/CVE-Factory .

2605.17126 2026-06-01 stat.ML cs.LG stat.ME

Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness, and Safety

无需特征值下界的多任务线性回归:自适应性、鲁棒性与安全性

Seok-Jin Kim

AI总结 针对存在污染任务的多任务线性回归问题,提出基于矩阵加权范数正则化的估计器,引入相对平衡条件,在弱谱假设下达到与现有方法相当的预测误差界,并具备安全性保证。

Comments Accepted at ICML 2026

详情
AI中文摘要

我们研究了存在污染任务的多任务线性回归问题。我们处理了大多数任务的未知参数在 $\ell_2$ 范数下接近,而部分任务是任意异常值的情况。现有的理论框架严重依赖于每个任务的经验二阶矩的最小特征值远离零(阶为 $\Omega(1)$)的假设。关键的是,这一假设在许多高维场景中不成立,导致先前的保证无效。为了克服这一限制,我们提出了一种基于矩阵加权范数正则化的估计器。我们还引入了一个相对平衡条件,由平衡常数量化,该条件将每个任务的二阶矩与平均内点几何进行比较,并放宽了对任务级二阶矩下界的需求。在具有适度平衡性的有利情况下,我们的预测 MSE 界在显著更弱的谱假设下匹配 Duan 和 Wang (2023) 的速率;由此得到的任务总体 MSE 在最小化极大意义下是最优的,仅相差对数因子。此外,我们证明了我们的估计器具有安全性保证:当相关的平衡常数很大或无穷大,或者任务不相关时,该方法的表现不会差于独立任务学习。

英文摘要

We study the multi-task linear regression problem in the presence of contaminated tasks. We address the setting where the unknown parameters of a majority of tasks are close in the $\ell_2$-norm, while a fraction of tasks are arbitrary outliers. Existing theoretical frameworks for this problem rely heavily on the assumption that the empirical second moment of each task has a minimum eigenvalue bounded away from zero (order $Ω(1)$). Crucially, this assumption fails in many high-dimensional scenarios, rendering prior guarantees vacuous. To overcome this limitation, we propose an estimator based on matrix-weighted norm regularization. We also introduce a relative balancedness condition, quantified by a balancedness constant, that compares each task's second moment with the average inlier geometry and relaxes the need for taskwise second-moment lower bounds. In favorable regimes with moderate balancedness, our prediction MSE bounds match the rate of Duan and Wang (2023) under substantially weaker spectral assumptions; the resulting task-overall MSE is minimax optimal up to logarithmic factors. Furthermore, we demonstrate that our estimator enjoys a safety guarantee: when the relevant balancedness constant is large or infinite, or when tasks are unrelated, the method performs no worse than independent task learning.

2605.11336 2026-06-01 cs.IR cs.AI cs.CL cs.HC

Much of Geospatial Web Search Is Beyond Traditional GIS

大部分地理空间网络搜索超越了传统GIS

Ilya Ilyankou, Stefano Cavazzi, James Haworth

AI总结 通过密集句子嵌入、SetFit分类器和密度聚类,在MS MARCO语料库中发现18%的查询具有地理空间性质,并构建了88类分类体系,揭示地理搜索以事务性和实用性查询为主,多数超出传统GIS和知识图谱范围。

详情
AI中文摘要

网络搜索查询涉及地点的频率远高于现有标注方案所表明的,然而地理空间网络搜索查询的景观——人们对地点的询问内容及其频率——在大规模上仍然缺乏特征描述。我们对包含101万条真实必应查询的完整MS MARCO语料库应用密集句子嵌入、轻量级SetFit分类器和基于密度的聚类,无需预先过滤地名或空间关键词,识别出181,827条地理空间查询(18.0%),几乎是原始标注中标记为“位置”的6.17%的三倍。由此产生的88个查询类别分类体系揭示,地理空间网络搜索以事务性和实用性查询为主:仅成本和价格就占地理空间查询的15.3%,几乎是整个自然地理主题规模的两倍。这些活动中的大部分——成本、营业时间、联系方式、天气、旅行推荐——超出了传统GIS和知识图谱旨在服务的范围。这些类别在它们所接受的答案类型上差异很大,从可由空间数据库或知识图谱回答的确定性查询,到需要生成式或实时系统的评估性或时间波动性查询。我们讨论了对混合检索架构以及大型语言模型中地理推理基准的启示。我们公开发布了标注数据集、分类器和分类体系。

英文摘要

Web search queries concern place far more often than existing labelling schemes suggest, yet the landscape of geospatial web search queries - what people ask of place, and how often - remains poorly characterised at scale. We apply dense sentence embeddings, a lightweight SetFit classifier, and density-based clustering to the full MS MARCO corpus of 1.01 million real Bing queries without prior filtering for toponyms or spatial keywords, identifying 181,827 geospatial queries (18.0%), nearly threefold the 6.17% labelled as Location in the original annotations. The resulting taxonomy of 88 query categories reveals that geospatial web search is dominated by transactional and practical lookups: costs and prices alone account for 15.3% of geospatial queries, nearly twice the size of the entire physical geography theme. Much of this activity - costs, opening hours, contact details, weather, travel recommendations - falls outside the scope of what traditional GIS and knowledge graphs are built to serve. The categories vary substantially in the kind of answer they admit, from deterministic lookups answerable from spatial databases or knowledge graphs to evaluative or temporally volatile queries that require generative or real-time systems. We discuss implications for hybrid retrieval architectures and for benchmarks of geographic reasoning in large language models. We openly release the labelled dataset, classifier, and taxonomy.

2605.02125 2026-06-01 cs.DC cs.LG

FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

FedQueue:面向跨设施HPC训练的队列感知联邦学习

Yijiang Li, Emon Dey, Zilinghan Li, Krishnan Raghavan, Ravi Madduri, Kibaek Kim

AI总结 提出FedQueue协议,通过在线预测队列延迟、截止时间准入控制和陈旧感知聚合,解决跨HPC设施联邦学习中的随机调度延迟问题,实现非凸目标下的收敛保证并在实际部署中提升20.5%性能。

详情
AI中文摘要

跨多个HPC设施的联邦学习面临来自批处理调度器的随机准入延迟,这些延迟主导了挂钟时间。同步联邦学习遭受严重的掉队者问题,而异步联邦学习在队列激增时会累积过时更新。我们提出FedQueue,一种队列感知的联邦学习协议,将调度器延迟直接纳入训练和聚合中,它(i)在线预测每个设施的队列延迟以预算本地工作量,(ii)应用基于截止时间的准入控制,缓冲迟到到达以限制陈旧度,以及(iii)执行陈旧感知聚合以稳定异构本地工作负载。我们证明了在有限陈旧度下非凸目标以$\mathcal{O}(1/\sqrt{R})$速率收敛,并表明在队列预测误差下准入控制以高概率产生有限陈旧度。FedQueue在真实跨设施部署中相比基线算法提升了20.5%。受控队列模拟显示对基线有稳健提升;特别是在高队列方差和非独立同分布分区下,达到目标精度水平的时间最多减少60%。

英文摘要

Federated learning (FL) across multiple HPC facilities faces stochastic admission delays from batch schedulers that dominate wall-clock time. Synchronous FL suffers from severe stragglers, while asynchronous FL accumulates stale updates when queues spike. We propose FedQueue, a queue-aware FL protocol that incorporates scheduler delays directly into training and aggregation, which (i) predicts per-facility queue delays online to budget local work, (ii) applies cutoff-based admission that buffers late arrivals to bound staleness, and (iii) performs staleness-aware aggregation to stabilize heterogeneous local workloads. We prove the convergence for non-convex objectives at rate $\mathcal{O}(1/\sqrt{R})$ under bounded staleness, and show that the admission controls yield bounded staleness with high probability under queue-prediction error. Real-world cross-facility deployment of FedQueue shows 20.5% improvement over baseline algorithms. Controlled queue simulations demonstrate robust improvement over the baselines; in particular, up to 60% reduction in time to reach a target accuracy level under high queue variance and non-IID partitions.

2512.03706 2026-06-01 physics.comp-ph cs.LG math.DS

Consistent Projection of Langevin Dynamics: Preserving Thermodynamics and Kinetics in Coarse-Grained Models

Langevin动力学的一致投影:在粗粒化模型中保持热力学和动力学

Vahid Nateghi, Lara Neureither, Selma Moqvist, Carsten Hartmann, Simon Olsson, Feliks Nüske

AI总结 提出一种基于投影的粗粒化形式,结合生成式扩展动态模式分解和热力学插值,准确捕捉全空间模型的热力学和动力学性质。

详情
AI中文摘要

粗粒化(CG)是对复杂多尺度系统(如生物分子的构象动力学)进行高效建模和模拟的重要任务。本文针对一般的欠阻尼Langevin动力学,提出了一种基于投影的粗粒化形式。遵循Zwanzig投影方法,我们推导了粗粒化动力学的闭式表达式。此外,我们展示了如何利用在Koopman算子方法背景下开发的生成式扩展动态模式分解(gEDMD)方法来建模CG动力学并评估其动力学性质,如过渡时间尺度。最后,我们将我们的方法与热力学插值(TI)相结合,这是一种在热力学条件之间转换样本的生成方法,从而无需重复数值模拟即可将方法扩展到跨热力学状态。通过一个二维模型系统,我们证明了所提出的方法能够准确捕捉全空间模型的热力学和动力学性质。

英文摘要

Coarse graining (CG) is an important task for efficient modeling and simulation of complex multi-scale systems, such as the conformational dynamics of biomolecules. This work presents a projection-based coarse-graining formalism for general underdamped Langevin dynamics. Following the Zwanzig projection approach, we derive a closed-form expression for the coarse grained dynamics. In addition, we show how the generator Extended Dynamic Mode Decomposition (gEDMD) method, which was developed in the context of Koopman operator methods, can be used to model the CG dynamics and evaluate its kinetic properties, such as transition timescales. Finally, we combine our approach with thermodynamic interpolation (TI), a generative approach to transform samples between thermodynamic conditions, to extend the scope of the approach across thermodynamic states without repeated numerical simulations. Using a two-dimensional model system, we demonstrate that the proposed method allows to accurately capture the thermodynamic and kinetic properties of the full-space model.

2605.06235 2026-06-01 cs.IR cs.AI

OBLIQ-Bench: Exposing Overlooked Bottlenecks in Modern Retrievers with Latent and Implicit Queries

OBLIQ-Bench:揭示现代检索器中被忽视的瓶颈——潜在与隐式查询

Diane Tchuindjo, Devavrat Shah, Omar Khattab

AI总结 针对现有检索基准饱和但实际搜索问题未解决的现象,提出一类“倾斜查询”并构建OBLIQ-Bench基准,揭示检索与验证之间的不对称性,即推理LLM能可靠识别潜在相关性但检索管道无法召回多数相关文档。

详情
AI中文摘要

检索基准日益饱和,但我们认为高效搜索远非已解决的问题。我们识别出一类称为“倾斜”的查询,它们寻求实例化潜在模式的文档,例如找到所有表达隐式立场的推文、展示特定失败模式的聊天记录或匹配抽象场景的转录文本。我们研究了倾斜性产生的三种机制,并引入了OBLIQ-Bench,这是一套基于真实长尾语料库的五个倾斜搜索问题。OBLIQ-Bench揭示了检索与验证之间一个被忽视的不对称性:当相关文档被呈现时,推理LLM能可靠地识别潜在相关性,但即使是复杂的检索管道也无法首先召回大多数相关文档。我们希望OBLIQ-Bench能推动研究高效捕获大规模语料库中潜在模式和隐式信号的检索架构。

英文摘要

Retrieval benchmarks are increasingly saturating, but we argue that efficient search is far from a solved problem. We identify a class of queries we call oblique, which seek documents that instantiate a latent pattern, like finding all tweets that express an implicit stance, chat logs that demonstrate a particular failure mode, or transcripts that match an abstract scenario. We study three mechanisms through which obliqueness may arise and introduce OBLIQ-Bench, a suite of five oblique search problems over real long-tail corpora. OBLIQ-Bench exposes an overlooked asymmetry between retrieval and verification, where reasoning LLMs reliably recognize latent relevance whenever relevant documents are surfaced, but even sophisticated retrieval pipelines fail to surface most relevant documents in the first place. We hope that OBLIQ-Bench will drive research into retrieval architectures that efficiently capture latent patterns and implicit signals in large corpora.

2604.23468 2026-06-01 math.MG cs.AI cs.LO math.NT

Progress in Formalizing Sphere Packing in Dimension 8

八维球堆积形式化进展

Sidharth Hariharan, Christopher Birkbeck, Seewoo Lee, Ho Kiu Gareth Ma, Bhavik Mehta, Auguste Poiroux, Maryna Viazovska

AI总结 本文介绍了使用Lean定理证明器形式化验证Viazovska在8维球堆积问题上的解,并讨论了与自动形式化模型Gauss的合作及剩余目标。

Comments 8 pages, title updated

详情
AI中文摘要

2016年,Viazovska利用模形式构造了一个满足Cohn和Elkies在2003年确定的最优性条件的“魔法”函数,著名地解决了8维球堆积问题。2024年3月,Hariharan和Viazovska启动了一个项目,旨在用Lean定理证明器形式化这一解及相关数学事实。2026年2月,一个重要的里程碑达成:该结果被形式化验证,验证的最后阶段由Math公司的自动形式化模型“Gauss”完成。我们讨论了实现这一里程碑所使用的技术,反思了人类与Gauss之间的独特合作,并讨论了剩余的项目目标。

英文摘要

In 2016, Viazovska famously solved the sphere packing problem in dimension $8$, using modular forms to construct a 'magic' function satisfying optimality conditions determined by Cohn and Elkies in 2003. In March 2024, Hariharan and Viazovska launched a project to formalize this solution and related mathematical facts in the Lean Theorem Prover. A significant milestone was achieved in February 2026: the result was formally verified, with the final stages of the verification done by Math, Inc.'s autoformalization model 'Gauss'. We discuss the techniques used to achieve this milestone, reflect on the unique collaboration between humans and Gauss, and discuss project objectives that remain.

2604.23436 2026-06-01 stat.ML cs.LG math.OC stat.CO

Inference of Online Newton Methods with Nesterov's Accelerated Sketching

带有Nesterov加速草图的在线牛顿方法的推断

Haoxuan Wang, Xinchen Du, Sen Na

AI总结 针对在线牛顿方法推断计算成本高的问题,提出结合Hessian平均与Nesterov加速草图投影求解器的方法,在保持一阶方法$O(d^2)$复杂度下实现二阶方法的鲁棒性,并建立了全局收敛性、渐近正态性和在线协方差估计器。

Comments 52 pages, 2 tables, 3 figures; accepted at ICML 2026

详情
AI中文摘要

基于流式数据的可靠决策需要对在线方法进行原则性的不确定性量化。虽然一阶方法能够实现高效的迭代更新,但其推断过程仍需更新适当的(协方差)矩阵,导致$O(d^2)$的时间和内存复杂度,并且对问题的病态性和噪声异质性敏感。这一昂贵的推断任务为更鲁棒的二阶方法提供了机会,然而二阶方法受限于求解牛顿系统所需的$O(d^3)$复杂度。在本文中,我们通过研究一种带有Hessian平均的在线牛顿方法来解决这一差距,其中每一步的牛顿方向使用带有Nesterov加速的草图投影求解器近似计算,匹配了一阶方法的$O(d^2)$复杂度。对于所提出的方法,我们量化了来自随机数据和随机计算的不确定性。在标准光滑性和矩条件下,我们建立了全局几乎必然收敛性,证明了最后迭代的渐近正态性,其极限协方差由Lyapunov方程刻画,并开发了一个完全在线的协方差估计器,具有非渐近收敛保证。我们还将所得的不确定性量化与没有Nesterov加速的精确和草图牛顿方法联系起来。在回归模型上的大量实验证明了所提出方法在在线推断中的优越性。

英文摘要

Reliable decision-making with streaming data requires principled uncertainty quantification of online methods. While first-order methods enable efficient iterate updates, their inference procedures still require updating proper (covariance) matrices, incurring $O(d^2)$ time and memory complexity, and are sensitive to ill-conditioning and noise heterogeneity of the problem. This costly inference task offers an opportunity for more robust second-order methods, which are, however, bottlenecked by solving Newton systems with $O(d^3)$ complexity. In this paper, we address this gap by studying an online Newton method with Hessian averaging, where the Newton direction at each step is approximately computed using a sketch-and-project solver with Nesterov's acceleration, matching $O(d^2)$ complexity of first-order methods. For the proposed method, we quantify its uncertainty arising from both random data and randomized computation. Under standard smoothness and moment conditions, we establish global almost-sure convergence, prove asymptotic normality of the last iterate with a limiting covariance characterized by a Lyapunov equation, and develop a fully online covariance estimator with non-asymptotic convergence guarantees. We also connect the resulting uncertainty quantification to that of exact and sketched Newton methods without Nesterov's acceleration. Extensive experiments on regression models demonstrate the superiority of the proposed method for online inference.

2604.22794 2026-06-01 eess.SY cs.LG cs.SY

Accelerating Reinforcement Learning for Wind Farm Control via Expert Demonstrations

通过专家演示加速风电场控制的强化学习

Marcus Binder Nilsen, Julian Quick, Tuhfe Göçmen, Nikolay Dimitrov, Pierre-Elouan Réthoré

AI总结 提出一种利用稳态尾流模型生成的专家演示预训练方法,通过行为克隆初始化Soft Actor-Critic网络,消除初始学习阶段,使初始性能接近基线水平,并在在线微调后超越查表控制器。

Comments Submitted to Journal of Physics: Conference Series (Torque 2026). This is the Accepted Manuscript version of an article accepted for publication in Journal of Physics: Conference Series. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. This Accepted Manuscript is published under a CC BY licence

详情
Journal ref
J. Phys.: Conf. Ser. 3224 032016 (2026)
AI中文摘要

强化学习为自适应风电场流量控制提供了一种有前景的方法,但其实际部署受到训练收敛缓慢和初始性能差的阻碍,如果直接部署未经训练的智能体,这些因素可能导致多年的功率输出减少。本文研究了稳态尾流模型中的领域知识是否可以加速强化学习训练并提高初始控制器性能。我们提出了一种预训练方法,其中通过在动态尾流模拟(WindGym)中部署基于PyWake的稳态优化器生成专家演示,然后通过行为克隆初始化Soft Actor-Critic智能体的演员和评论家网络。在2x2风电场上的实验表明,预训练消除了代价高昂的初始学习阶段:未经训练的智能体性能比贪婪零偏航基线低约12%,而预训练将初始性能提升至接近基线水平。在在线微调过程中,所有配置在250,000个环境步骤内收敛到相似性能,最终超过查表控制器,后者在500,000步后达到约7%的功率增益。

英文摘要

Reinforcement learning (RL) offers a promising approach for adaptive wind farm flow control, yet its practical deployment is hindered by slow training convergence and poor initial performance, factors that could translate to years of reduced power output if an untrained agent were deployed directly. This work investigates whether domain knowledge from steady-state wake models can accelerate RL training and improve initial controller performance. We propose a pretraining methodology in which expert demonstrations are generated by deploying a PyWake-based steady-state optimizer within a dynamic wake simulation (WindGym), then used to initialize both the actor and critic networks of a Soft Actor-Critic agent via behavior cloning. Experiments on a 2x2 wind farm show that pretraining eliminates the costly initial learning phase: while an untrained agent underperforms the greedy zero-yaw baseline by approximately 12%, pretraining raises initial performance to near-baseline levels. During online fine-tuning, all configurations converge within 250,000 environment steps to achieve similar performance, ultimately exceeding that of a lookup-table controller, which reaches approximately 7% power gain after 500,000 steps.

2604.22722 2026-06-01 cs.IR cs.AI cs.LG

Aligning Dense Retrievers with LLM Utility via Distillation

通过蒸馏将稠密检索器与LLM效用对齐

Rajinder Sandhu, Di Mu, Cheng Chang, Md Shahriar Tasjid, Himanshu Rai, Maksims Volkovs, Ga Wu

AI总结 提出Utility-Aligned Embeddings (UAE)框架,通过蒸馏LLM的困惑度降低效用分布来训练双编码器,在不增加测试时LLM推理开销的情况下提升稠密检索的精度和效率。

详情
AI中文摘要

稠密向量检索是检索增强生成(RAG)的实用支柱,但相似性搜索可能受限于精度。相反,利用LLM重排序的基于效用的方法通常能实现更优性能,但计算成本高且易受困惑度估计中固有噪声的影响。我们提出Utility-Aligned Embeddings (UAE),一个旨在将这些优势融合为实用、高性能检索方法的框架。我们将检索表述为分布匹配问题,使用Utility-Modulated InfoNCE目标训练双编码器以模仿由困惑度降低导出的效用分布。该方法将分级效用信号直接注入嵌入空间,无需测试时LLM推理。在QASPER基准上,UAE在召回率@1上提升30.59%,MAP提升30.16%,Token F1提升17.3%,优于强语义基线BGE-Base。关键的是,UAE比高效的LLM重排序方法快180倍以上,同时保持竞争性能,表明将检索与生成效用对齐能在规模上产生可靠的上下文。

英文摘要

Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a practical, high-performance retrieval method. We formulate retrieval as a distribution matching problem, training a bi-encoder to imitate a utility distribution derived from perplexity reduction using a Utility-Modulated InfoNCE objective. This approach injects graded utility signals directly into the embedding space without requiring test-time LLM inference. On the QASPER benchmark, UAE improves retrieval Recall@1 by 30.59%, MAP by 30.16% and Token F1 by 17.3% over the strong semantic baseline BGE-Base. Crucially, UAE is over 180x faster than the efficient LLM re-ranking methods preserving competitive performance, demonstrating that aligning retrieval with generative utility yields reliable contexts at scale.

2604.09412 2026-06-01 stat.ML cond-mat.dis-nn cs.LG

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

高维两层ReLU神经网络损失景观中局部极小值的精确描述

Jie Huang, Bruno Loureiro, Stefano Sarao Mannelli

AI总结 本文通过总结统计量精确刻画了高维两层ReLU神经网络损失景观中的局部极小值,并建立了与单次SGD的关联,揭示了过参数化对极小值稳定性和可达性的影响。

Comments 29 pages, 18 figures. Accepted as a conference paper at ICML 2026

详情
AI中文摘要

我们研究了在可实现教师-学生设置下,具有高斯协变量的形式为$\sum_{k=1}^K \mathrm{ReLU}(w_k^ op x)$的两层ReLU网络的总体损失景观。我们证明局部极小值在总结统计量方面允许精确的低维表示,从而对景观产生清晰且可解释的描述。我们进一步建立了与单次SGD的直接联系:局部极小值对应于总结统计量空间中动力学的吸引不动点。这一视角揭示了极小值分组成离散族的层次结构,并展示了过参数化如何改变它们在基于梯度动力学下的稳定性和可达性。在这种过参数化机制下,全局极小值变得越来越可访问,吸引动力学并减少收敛到虚假解。总的来说,我们的结果揭示了常见简化假设的内在局限性,这些假设即使在最小的神经网络模型中也可能遗漏损失景观的基本特征。

英文摘要

We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical organisation of minima into discrete families and shows how overparameterisation changes their stability and reachability under gradient-based dynamics. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.

2604.02969 2026-06-01 stat.ML cs.LG stat.CO stat.ME

Inversion-Free Natural Gradient Descent on Riemannian Manifolds

黎曼流形上的无逆自然梯度下降

Dario Draca, Takuo Matsubara, Minh-Ngoc Tran

AI总结 针对参数位于一般黎曼流形上的统计模型,提出一种内在的无逆自然梯度方法,通过流形上逆Fisher信息矩阵的移动近似和低秩更新,避免矩阵求逆,并证明迭代序列的几乎必然收敛率。

Comments 80 pages, 4 figures. Updated empirical examples

详情
AI中文摘要

自然梯度法是统计优化的核心工具,但其广泛应用受到欧几里得参数空间假设、Fisher信息矩阵(FIM)的重复估计以及后续求逆计算成本的限制。本文针对参数位于一般黎曼流形上的统计模型,提出了一种内在的、无逆的自然梯度方法。在这种非欧几里得设定下进行统计优化,可以自然地强制执行参数约束、消除不可辨识参数,并利用测地凸性。我们的算法基于逆FIM的移动近似,该近似直接在流形上维护。通过低秩矩阵恒等式,利用新的得分向量高效更新该近似。我们证明了迭代序列的几乎必然收敛率为$O(\log s / s^α)$,近似FIM也有类似速率。针对大规模应用,进一步提出了一种存储复杂度为次二次的有限内存变体。我们在Bures-Wasserstein流形上的变分贝叶斯、Stiefel流形上的归一化流以及降秩逻辑回归中展示了我们方法的有效性。

英文摘要

The natural gradient method is a central tool for statistical optimisation, but its broader application is hindered by the assumption of a Euclidean parameter space, the repeated estimation of the Fisher information matrix (FIM), and the computational cost of its subsequent inversion. This paper proposes an intrinsic, inversion-free natural gradient method for statistical models whose parameters lie on general Riemannian manifolds. Formulating statistical optimisation in this non-Euclidean setting allows for the natural enforcement of parameter constraints, the elimination of non-identifiable parameters, and the exploitation of geodesic convexity. Our algorithm is based on a moving approximation of the inverse FIM, which is maintained directly on the manifold. This approximation is efficiently updated with new score vectors using low-rank matrix identities. We prove almost-sure convergence rates of $O(\log s / s^α)$ for the sequence of iterates, and a similar rate for the approximate FIM. A limited-memory variant with sub-quadratic storage complexity is further proposed for large-scale applications. We demonstrate the efficacy of our method on variational Bayes within the Bures-Wasserstein manifold, normalising flows on the Stiefel manifold, and reduced-rank logistic regression.

2603.29972 2026-06-01 stat.ME cs.LG econ.EM stat.ML

Do covariates explain why these groups differ? The choice of reference group can reverse conclusions in the Oaxaca-Blinder decomposition

协变量能否解释这些群体差异?参考组的选择可能逆转Oaxaca-Blinder分解的结论

Manuel Quintero, Advik Shreekumar, William T. Stephenson, Tamara Broderick

AI总结 本文通过理论和实证证明,在Oaxaca-Blinder分解中,参考组的选择可能导致实质性不同的结论,且该问题在复杂回归模型中更为常见,建议研究者报告两种方向的分解结果。

Comments 28 pages, 4 figures

详情
AI中文摘要

科学家们常常试图解释为什么两个群体的结果存在差异。例如,两家医院患者死亡率的差异可能源于患者本身的差异(协变量)或医疗护理的差异(给定协变量下的结果)。Oaxaca-Blinder分解(OBD)是区分这些因素的标准工具。众所周知,OBD需要选择其中一个群体作为参考,且数值答案可能因参考组而异。据我们所知,目前尚无系统研究探讨OBD参考组的选择是否会导致不同的实质性结论以及该问题的普遍性。在本文中,我们通过真实数据和模拟数据给出了存在性证明,表明OBD参考组确实可能导致实质性不同的结论。我们的实证研究发现,当OBD扩展到更复杂的回归模型(包括预训练变换器)时,这种敏感性更为常见。我们的理论和实证结果共同表明,这些结论逆转并非完全由模型误设、小数据或对抗性参数选择导致。我们的结果表明,实践者应始终报告OBD的两个方向;现代机器学习和大数据集并不能自动解决结论逆转问题;且需要进一步研究这一问题。

英文摘要

Scientists often want to explain why an outcome is different in two groups. For instance, differences in patient mortality rates across two hospitals could be due to differences in the patients themselves (covariates) or differences in medical care (outcomes given covariates). The Oaxaca--Blinder decomposition (OBD) is a standard tool to tease apart these factors. It is well known that the OBD requires choosing one of the groups as a reference, and the numerical answer can vary with the reference. To the best of our knowledge, there has been no systematic investigation into whether the choice of OBD reference can yield different substantive conclusions and how common this issue is. In the present paper, we give existence proofs in real and simulated data that the OBD references can in fact yield substantively different conclusions. Our empirical exercises find that this sensitivity is more common when the OBD is extended to more complex regression models, including a pretrained transformer. Our theoretical and empirical results together establish that these conclusion reversals are not entirely driven by model misspecification, small data, or adversarial parameter choices. Our results suggest that practitioners should always report both directions of the OBD; that modern machine learning and large datasets do not automatically resolve the conclusion reversal problem; and that further work on this problem is needed.

2603.20253 2026-06-01 physics.comp-ph cs.AI cs.DC cs.LG

SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs

SimulCost: 一个用于自动化物理模拟的代价感知基准与工具包

Yadi Cao, Sicheng Lai, Jiahe Huang, Yang Zhang, Zach Lawrence, Rohan Bhakta, Izzy F. Thomas, Mingyun Cao, Chung-Hao Tsai, Zihao Zhou, Yidong Zhao, Hao Liu, Alessandro Marinoni, Alexey Arefiev, Rose Yu

AI总结 针对现有LLM评估忽略工具使用代价的问题,提出SimulCost基准,通过单轮和多轮参数调优任务比较LLM与传统扫描方法在准确性和计算代价上的表现,发现LLM在高精度任务中初始猜测不可靠且多轮模式效率更低。

Comments accepted version at ICML

详情
AI中文摘要

评估用于科学任务的LLM代理时,现有研究主要关注令牌成本,而忽略了工具使用成本,如模拟时间和实验资源。因此,在现实预算约束下,pass@k等指标变得不切实际。为弥补这一差距,我们引入了SimulCost,这是首个针对物理模拟中代价敏感参数调优的基准。SimulCost将LLM调优代价敏感参数与传统扫描方法在准确性和计算成本上进行比较,涵盖了来自流体动力学、固体力学和等离子体物理的13个模拟器中的2,947个单轮(初始猜测)和1,931个多轮(通过试错调整)任务。每个模拟器的成本是解析定义的且与平台无关。前沿LLM在单轮模式下的成功率为46-65%,在高精度要求下下降至35-55%,使得它们的初始猜测在高精度任务中不可靠。多轮模式将成功率提升至72-81%,但LLM比传统扫描慢1.5-2.5倍,因此不是经济的选择。我们还研究了参数组相关性以了解知识迁移潜力,以及上下文示例和推理努力的影响,为部署和微调提供了实际意义。我们将SimulCost开源为一个静态基准和可扩展工具包,以促进改进物理模拟的代价感知代理设计以及扩展新的模拟环境的研究。代码和数据可在https://github.com/Rose-STL-Lab/SimulCost-Bench获取。

英文摘要

Evaluating LLM agents for scientific tasks has focused on token costs while ignoring tool-use costs like simulation time and experimental resources. As a result, metrics like pass@k become impractical under realistic budget constraints. To address this gap, we introduce SimulCost, the first benchmark targeting cost-sensitive parameter tuning in physics simulations. SimulCost compares LLM tuning cost-sensitive parameters against traditional scanning approach in both accuracy and computational cost, spanning 2,947 single-round (initial guess) and 1,931 multi-round (adjustment by trial-and-error) tasks across 13 simulators from fluid dynamics, solid mechanics, and plasma physics. Each simulator's cost is analytically defined and platform-independent. Frontier LLMs achieve 46-65% success rates in single-round mode, dropping to 35-55% under high accuracy requirements, rendering their initial guesses unreliable especially for high accuracy tasks. Multi-round mode improves rates to 72-81%, but LLMs are 1.5-2.5x slower than traditional scanning, making them uneconomical choices. We also investigate parameter group correlations for knowledge transfer potential, and the impact of in-context examples and reasoning effort, providing practical implications for deployment and fine-tuning. We open-source SimulCost as a static benchmark and extensible toolkit to facilitate research on improving cost-aware agentic designs for physics simulations, and for expanding new simulation environments. Code and data are available at https://github.com/Rose-STL-Lab/SimulCost-Bench.

2603.26506 2026-06-01 q-bio.NC cs.LG

Identifying Connectivity Distributions from Neural Dynamics Using Flows

利用流从神经动力学中识别连接分布

Timothy Doyeon Kim, Ulises Pereira-Obilinovic, Yiliu Wang, Eric Shea-Brown, Uygar Sümbül

AI总结 针对低秩循环神经网络(lrRNN)中连接结构不可识别的问题,提出基于最大熵和连续归一化流(CNF)的推理框架,通过流匹配训练学习连接权重分布,以无偏方式匹配观测动力学,并应用于合成数据和真实神经记录。

详情
AI中文摘要

连接结构塑造了神经计算,但从群体记录中推断这种结构是退化的:多种连接结构可以产生相同的动力学。最近的工作使用低秩循环神经网络(lrRNN)从观测活动中推断低维潜在动力学和连接,从而对动力学进行机制性解释。然而,训练lrRNN的标准方法可能会恢复与潜在动力学无关的虚假结构。我们首先刻画了lrRNN中连接结构的可识别性,并确定了唯一解存在的条件。为了找到这样的解,我们开发了一个基于最大熵和连续归一化流(CNF)的推理框架,通过流匹配进行训练。我们的方法不是估计单个连接矩阵,而是学习一个连接权重的分布,该分布在不可识别分量上最大程度地无偏,同时匹配观测动力学。这种方法捕捉了复杂但必要的分布,例如经验数据中发现的重尾连接。我们在具有产生多稳态吸引子、极限环和环吸引子的连接结构的合成数据集上验证了我们的方法,并展示了其在决策过程中大鼠额叶皮层记录中的适用性。我们的框架将电路推断从恢复连接转变为识别哪些连接结构是计算上必需的,哪些是欠约束推断的产物。

英文摘要

Connectivity structure shapes neural computation, but inferring this structure from population recordings is degenerate: multiple connectivity structures can generate identical dynamics. Recent work uses low-rank recurrent neural networks (lrRNNs) to infer low-dimensional latent dynamics and connectivity from observed activity, enabling a mechanistic interpretation of the dynamics. However, standard approaches for training lrRNNs can recover spurious structures irrelevant to the underlying dynamics. We first characterize the identifiability of connectivity structures in lrRNNs and determine conditions under which a unique solution exists. To find such solutions, we develop an inference framework based on maximum entropy and continuous normalizing flows (CNFs), trained via flow matching. Instead of estimating a single connectivity matrix, our method learns a distribution over connection weights that is maximally unbiased over unidentifiable components while matching the observed dynamics. This approach captures complex yet necessary distributions such as heavy-tailed connectivity found in empirical data. We validate our method on synthetic datasets with connectivity structures that generate multistable attractors, limit cycles, and ring attractors, and demonstrate its applicability in recordings from rat frontal cortex during decision-making. Our framework shifts circuit inference from recovering connectivity to identifying which connectivity structures are computationally required, and which are artifacts of underconstrained inference.

2601.11702 2026-06-01 cs.HC cs.AI

PASTA: A Scalable Framework for Multi-Policy AI Compliance Evaluation

PASTA: 一种用于多策略AI合规评估的可扩展框架

Yu Yang, Ig-Jae Kim, Dongwook Yoon

AI总结 提出PASTA框架,通过模型卡格式、策略规范化、LLM驱动的成对评估引擎和可解释界面,实现多策略AI合规的快速、低成本评估,专家评估显示与人类判断高度一致。

Comments 28 pages, 7 figures

详情
AI中文摘要

随着AI系统变得更加强大和普及,AI合规性变得越来越关键。然而,AI政策的快速扩张给缺乏政策专业知识的资源受限从业者带来了沉重负担。现有方法通常一次只处理一项政策,使得多政策合规成本高昂。我们提出了PASTA,一种可扩展的合规工具,集成了四项创新:(1)一种全面的模型卡格式,支持跨开发阶段的描述性输入;(2)一种策略规范化方案;(3)一个高效的基于LLM的成对评估引擎,具有成本节约策略;(4)一个通过合规热图和可操作建议提供可解释评估的界面。专家评估显示,PASTA的判断与人类专家高度一致(ρ≥.626)。该系统在约3美元的成本下,在两分钟内评估五项主要政策。一项用户研究(N=12)证实,从业者发现输出易于理解和可操作,为可扩展的自动化AI治理引入了一个新颖的框架。

英文摘要

AI compliance is becoming increasingly critical as AI systems grow more powerful and pervasive. Yet the rapid expansion of AI policies creates substantial burdens for resource-constrained practitioners lacking policy expertise. Existing approaches typically address one policy at a time, making multi-policy compliance costly. We present PASTA, a scalable compliance tool integrating four innovations: (1) a comprehensive model-card format supporting descriptive inputs across development stages; (2) a policy normalization scheme; (3) an efficient LLM-powered pairwise evaluation engine with cost-saving strategies; and (4) an interface delivering interpretable evaluations via compliance heatmaps and actionable recommendations. Expert evaluation shows PASTA's judgments closely align with human experts ($ρ\geq .626$). The system evaluates five major policies in under two minutes at approximately \$3. A user study (N = 12) confirms practitioners found outputs easy-to-understand and actionable, introducing a novel framework for scalable automated AI governance.

2510.02578 2026-06-01 q-bio.BM cs.LG

FLOWR.root: A flow matching based foundation model for joint multi-purpose structure-aware 3D ligand generation and affinity prediction

FLOWR.root:基于流匹配的基础模型,用于联合多用途结构感知3D配体生成和亲和力预测

Julian Cremer, Tuan Le, Mohammad M. Ghahremanpour, Emilia Sługocka, Filipe Menezes, Djork-Arné Clevert

AI总结 提出SE(3)-等变流匹配模型FLOWR.root,实现口袋感知的3D配体生成、效力与结合亲和力预测及置信度估计,支持从头生成、条件采样、片段优化替换及多终点亲和力预测,在无条件分子生成和口袋条件配体生成上达到最优性能,并通过参数高效微调在亲和力预测上超越现有方法。

详情
AI中文摘要

我们提出了FLOWR.root,一个SE(3)-等变流匹配模型,用于口袋感知的3D配体生成,同时进行效力和结合亲和力预测及置信度估计。该模型支持从头生成、相互作用和药效团条件采样、片段优化和替换,以及多终点亲和力预测(pIC50、pKi、pKd、pEC50)。训练结合了大规模配体库与混合保真度的蛋白质-配体复合物,并在精选的共晶数据集上进行了细化,通过参数高效微调适应项目特定数据。基础FLOWR.root模型在无条件3D分子和口袋条件配体生成中达到了最先进的性能。在HiQBind上,预训练和微调后的模型展示了高度准确的亲和力预测,并在FEP+/OpenFE基准测试中超越了Boltz-2等最新方法,具有显著的速度优势。然而,我们表明解决未见过的结构-活性景观需要领域适应;参数高效的LoRA微调在多样化的专有数据集和PDE10A上带来了显著改进。联合生成和亲和力预测通过重要性采样实现了推理时缩放,将设计引导向更高亲和力的化合物。案例研究验证了这一点:针对CLK3的选择性CK2α配体生成显示了预测结合能与量子力学结合能之间的显著相关性。在ERα、TYK2和BACE1上的骨架优化证实了预测亲和力与QM计算之间的强一致性,同时确认了几何保真度。通过整合结构感知生成、亲和力估计、属性引导采样和高效领域适应,FLOWR.root为从先导发现到先导优化的基于结构的药物设计提供了全面基础。

英文摘要

We present FLOWR.root, an SE(3)-equivariant flow-matching model for pocket-aware 3D ligand generation with joint potency and binding affinity prediction and confidence estimation. The model supports de novo generation, interaction- and pharmacophore-conditional sampling, fragment elaboration and replacement, and multi-endpoint affinity prediction (pIC50, pKi, pKd, pEC50). Training combines large-scale ligand libraries with mixed-fidelity protein-ligand complexes, refined on curated co-crystal datasets and adapted to project-specific data through parameter-efficient finetuning. The base FLOWR.root model achieves state-of-the-art performance in unconditional 3D molecule and pocket-conditional ligand generation. On HiQBind, the pre-trained and finetuned model demonstrates highly accurate affinity predictions, and outperforms recent state-of-the-art methods such as Boltz-2 on the FEP+/OpenFE benchmark with substantial speed advantages. However, we show that addressing unseen structure-activity landscapes requires domain adaptation; parameter-efficient LoRA finetuning yields marked improvements on diverse proprietary datasets and PDE10A. Joint generation and affinity prediction enable inference-time scaling through importance sampling, steering design toward higher-affinity compounds. Case studies validate this: selective CK2$α$ ligand generation against CLK3 shows significant correlation between predicted and quantum-mechanical binding energies. Scaffold elaboration on ER$α$, TYK2, and BACE1 demonstrates strong agreement between predicted affinities and QM calculations while confirming geometric fidelity. By integrating structure-aware generation, affinity estimation, property-guided sampling, and efficient domain adaptation, FLOWR.root provides a comprehensive foundation for structure-based drug design from hit identification through lead optimization.

2603.22867 2026-06-01 cs.AR cs.AI cs.LG

TRINE: A Token-Aware, Runtime-Adaptive FPGA Inference Engine for Multimodal AI

TRINE: 一种面向多模态AI的令牌感知、运行时自适应FPGA推理引擎

Hyunwoo Oh, Hanning Chen, Sanggeon Yun, Yang Ni, Suyeon Jang, Behnam Khaleghi, Fei Wen, Mohsen Imani

AI总结 针对多模态AI中不同计算/内存模式导致嵌入式平台实时性不足的问题,提出TRINE,一种无需重配置的单比特流FPGA加速器与编译器,通过统一层映射、运行时模式切换、令牌剪枝和依赖感知层卸载,实现端到端多模态推理,在Alveo U50和ZCU104上相比RTX 4090和Jetson Orin Nano分别降低延迟22.57倍和6.86倍,功耗仅20-21W。

Comments Accepted to DAC 2026

详情
AI中文摘要

混合ViT、CNN、GNN和Transformer NLP的多模态堆栈给嵌入式平台带来压力,因为它们的计算/内存模式不同,且硬实时目标几乎没有松弛空间。TRINE是一个单比特流FPGA加速器和编译器,无需重配置即可执行端到端多模态推理。层被统一为DDMM/SDDMM/SpMM,并映射到一个模式可切换的引擎上,该引擎在运行时在权重/输出驻留脉动阵列、1xCS SIMD和可路由加法树(RADT)之间切换,共享PE阵列。一个宽度匹配的两阶段top-k单元支持流内令牌剪枝,而依赖感知层卸载(DALO)在可重构处理单元上重叠独立内核以维持利用率。在Alveo U50和ZCU104上评估,TRINE相比RTX 4090和Jetson Orin Nano分别降低延迟高达22.57倍和6.86倍,功耗20-21W;仅令牌剪枝在ViT密集型流水线上可实现高达7.8倍加速,DALO贡献高达79%的吞吐量提升。采用int8量化,代表性任务的精度下降<2.5%,为统一的视觉、语言和图工作负载提供了最先进的延迟和能效——仅需一个比特流。

英文摘要

Multimodal stacks that mix ViTs, CNNs, GNNs, and transformer NLP strain embedded platforms because their compute/memory patterns diverge and hard real-time targets leave little slack. TRINE is a single-bitstream FPGA accelerator and compiler that executes end-to-end multimodal inference without reconfiguration. Layers are unified as DDMM/SDDMM/SpMM and mapped to a mode-switchable engine that toggles at runtime among weight/output-stationary systolic, 1xCS SIMD, and a routable adder tree (RADT) on a shared PE array. A width-matched, two-stage top-k unit enables in-stream token pruning, while dependency-aware layer offloading (DALO) overlaps independent kernels across reconfigurable processing units to sustain utilization. Evaluated on Alveo U50 and ZCU104, TRINE reduces latency by up to 22.57x vs. RTX 4090 and 6.86x vs. Jetson Orin Nano at 20-21 W; token pruning alone yields up to 7.8x on ViT-heavy pipelines, and DALO contributes up to 79% throughput improvement. With int8 quantization, accuracy drops remain <2.5% across representative tasks, delivering state-of-the-art latency and energy efficiency for unified vision, language, and graph workloads-in one bitstream.