arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2370
2605.29353 2026-05-29 cs.CR cs.CV

DeepFake Forensics AI: A Multi-Modal Detection and Blockchain-Anchored Evidence Management Platform

DeepFake Forensics AI:多模态检测与区块链锚定证据管理平台

Naisha Minnah

AI总结 提出一个统一平台,通过训练四种神经网络检测图像、视频和音频中的合成媒体,并利用以太坊区块链实现证据的不可篡改存储与管理。

Comments 5 pages, 5 figures, 3 tables

详情
AI中文摘要

AI生成的合成媒体的激增对法律和取证背景下数字证据的完整性构成了严重威胁。现有的深度伪造检测系统通常处理单一模态,并且没有提供防篡改证据保存的机制。我们提出了DeepFake Forensics AI,这是一个统一平台,能够检测图像、视频和音频模态中的合成媒体,识别生成架构指纹,并将取证证据不可变地锚定在以太坊区块链上。我们的系统从头训练了四个独立的神经网络:一个EfficientNet-B4图像检测器(AUC = 0.9868)、一个双向LSTM视频检测器(AUC = 0.9628)、一个ECAPA-TDNN音频检测器(EER = 18.63%),以及一个新颖的GAN指纹模块(准确率 = 99.88%),用于识别伪造图像背后的生成架构。证据文件通过SHA-256哈希,通过Pinata存储在IPFS上,并通过基于角色的访问控制的Solidity智能合约在链上注册。该平台提供了React前端和FastAPI后端,适用于取证和法律工作流程的部署。据我们所知,这是第一个将多模态深度伪造检测与基于区块链的链上保管管理相统一的系统。

英文摘要

The proliferation of AI-generated synthetic media poses a critical threat to the integrity of digital evidence in legal and forensic contexts. Existing deepfake detection systems typically address a single modality and provide no mechanism for tamper-proof evidence preservation. We present DeepFake Forensics AI, a unified platform that detects synthetic media across image, video, and audio modalities, identifies generative architecture fingerprints, and anchors forensic evidence immutably on the Ethereum blockchain. Our system trains four independent neural networks from scratch: an EfficientNet-B4 image detector (AUC = 0.9868), a Bidirectional LSTM video detector (AUC= 0.9628), an ECAPA-TDNN audio detector (EER = 18.63%), and a novel GAN fingerprinting module (accuracy = 99.88%) that identifies the generative architecture behind a fake image. Evidence files are hashed with SHA-256, stored on IPFS via Pinata, and registered on-chain via a Solidity smart contract with role-based access control. The platform provides a React frontend and FastAPI backend suitable for deployment in forensic and legal workflows. To our knowledge, this is the first system to unify multi-modal deepfake detection with blockchain-based chain-of custody management.

2605.29329 2026-05-29 q-bio.QM cs.LG

Mixing Vector Model for Copolymer Inference via Mixed Integer Linear Programming

基于混合整数线性规划的共聚物推断的混合向量模型

Jianshen Zhu, Raveena Rai, Taiyo Sohkawa, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu

AI总结 提出混合向量模型,通过混合整数线性规划实现共聚物的逆设计,在多个物化数据集上取得高预测精度并保持可解性。

详情
AI中文摘要

最近开发了一种新颖的两阶段分子推断框架mol-infer,通过两层模型下的混合整数线性规划(MILP),在给定学习预测函数和结构约束的条件下,以最优性和精确性推断具有规定抽象结构和期望性质值的化学图。在本研究中,我们通过引入一种称为混合向量(MV)模型的简单特征表示,将该框架扩展到共聚物。在所提出的模型中,共聚物特征向量表示为MILT可处理单体描述符的凸组合,加权系数为组成单体的混合比例。这种表示不需要明确的序列类别信息,因此自然兼容基于MILP的逆设计。在该模型下,我们使用人工神经网络、简化二次多元线性回归和随机森林为多个共聚物性质数据集构建预测函数。所提出的表示在多个物理化学性质数据集上实现了实际有用的预测性能;特别地,十个数据集中有九个的最佳测试R²分数超过0.7,六个数据集超过0.9。我们还制定了在MV表示下具有规定混合比例的多单体逆设计问题,并表明即使在三单体设置下,生成的MILP实例仍然可解。最后,我们通过重新评估推断的候选物并将重新计算的性质值与学习模型预测的值进行比较,进行外部一致性检查。总体而言,所提出的框架为在两层模型下实现共聚物的模型级精确逆设计提供了可处理的第一步。

英文摘要

A novel two-phase molecule inference framework, mol-infer, has recently been developed to infer chemical graphs with prescribed abstract structures and desired property values through mixed integer linear programming (MILP) under the two-layered model, with guaranteed optimality and exactness relative to the given learned prediction function and structural constraints. In this study, we extend this framework to copolymers by introducing a simple feature representation, called the mixing vector (MV) model. In the proposed model, a copolymer feature vector is represented as a convex combination of MILP-tractable monomer descriptors weighted by the mixing ratio of the constituent monomers. This representation does not require explicit sequence-class information and is therefore naturally compatible with MILP-based inverse design. Under this model, we construct prediction functions for several copolymer property datasets using artificial neural networks, reduced quadratic multiple linear regression, and random forests. The proposed representation achieves practically useful predictive performance across multiple physicochemical property datasets; in particular, the best test R^2 score exceeds 0.7 for nine of the ten datasets and exceeds 0.9 for six datasets. We also formulate a multi-monomer inverse-design problem under the MV representation with a prescribed mixing ratio and show that the resulting MILP instances remain tractable, even for three-monomer settings. Finally, we perform an external consistency check by re-evaluating the inferred candidates and comparing the re-computed property values with those predicted by the learned model. Overall, the proposed framework gives a tractable first step toward model-level exact inverse design of copolymers under the two-layered model.

2605.29318 2026-05-29 cs.GR cs.CV

FreeForm: Reduced-Order Deformable Simulation from Particle-Based Skinning Eigenmodes

FreeForm: 基于粒子蒙皮特征模态的降阶可变形仿真

Donglai Xiang, Vismay Modi, Rishit Dagli, Ty Trusty, Gilles Daviet, Anka He Chen, Nicholas Sharp, David I. W. Levin

AI总结 提出一种基于再生核粒子法的无网格降阶超弹性物体仿真方法,通过求解弹性能量Hessian矩阵的广义特征系统构建降阶蒙皮权重,实现40倍训练加速并降低仿真误差。

Comments CVPR 2026, project website: https://research.nvidia.com/labs/sil/projects/freeform/

详情
AI中文摘要

我们提出了一种新的无网格、降阶可变形超弹性物体仿真方法。现有的降阶弹性动力学仿真工作要么通过网格表示输入几何体(由于扫描和三角化复杂形状的挑战,网格难以获得),要么通过需要逐形状优化的神经场表示。我们提出采用再生核粒子法(RKPM)表示,通过求解弹性能量Hessian矩阵上的广义特征系统,构建降阶蒙皮权重。我们证明,与神经场的逐形状优化相比,该公式不仅实现了40倍的训练加速,而且在与有限元方法的收敛结果进行评估时,实现了更低的仿真误差。我们在不同表示(包括网格和高斯溅射)的各种物体上展示了仿真结果,以及我们的方法在机器人仿真下游任务中的应用。

英文摘要

We present a novel formulation for mesh-free, reduced-order simulation of deformable hyperelastic objects. Existing work in reduced-order elastodynamic simulation represents the input geometry by either meshes, which can be difficult to obtain due to challenges in scanning and triangulating complex shapes, or by neural fields that require per-shape optimization. We propose to adopt a Reproducing Kernel Particle Method (RKPM) representation, which enables the construction of reduced-order skinning weights by solving a generalized eigensystem on the Hessian matrix of the elastic energy. We demonstrate that this formulation not only leads to a 40x training speedup compared with the per-shape optimization of neural fields, but also achieves lower simulation error when evaluated against the converged results of finite element method. We show our simulation results on a wide variety of objects in different representations including meshes and Gaussian splats, as well as the application of our method in the downstream task of robot simulation.

2605.29277 2026-05-29 cs.SE cs.AI

Code-QA-Bench: Separating Code Reasoning from Documentation Memorization in Repository-Level QA

Code-QA-Bench:在仓库级问答中分离代码推理与文档记忆

Jun Zhang, JianYing Qu, Hanwen Du, Zhongkai Sun, Yehua Yang, Qiao Zhao

AI总结 提出Code-QA-Bench框架,通过答案优先生成和三条件实验设计,自动构建仓库级代码理解基准,以区分代码推理、文档回忆和预训练记忆的影响。

详情
AI中文摘要

我们提出了Code-QA-Bench,一个全自动框架,用于合成仓库级代码理解基准,将真正的代码理解与文档回忆和预训练记忆分离。该框架有两个方法论贡献:(1)答案优先生成流程,其中配备工具的代理探索源代码以生成经过验证的金色答案,然后推导问题,确保每个任务都基于真实的代码结构;(2)三条件实验设计,在闭卷(无仓库)、仅代码(移除文档)和带文档(完整仓库)条件下评估代理,差值直接量化文档效用和记忆。我们从SWE-Bench中的10个Python仓库生成了528个代码可推导任务和100个文档依赖任务,由LLM评判员根据准确性、完整性和特异性评分。对四个前沿模型的实验表明,代码访问是主导因素(比闭卷平均提高0.23),文档提供了适度的额外收益(文档依赖任务上提高0.071),并且在代码可推导任务上仅代码≈带文档,验证了该设计。该框架是开源的,适用于任何文档良好的Python仓库。

英文摘要

We present Code-QA-Bench, a fully automated framework for synthesizing repository-level code understanding benchmarks that separates genuine code comprehension from documentation recall and pretraining memorization. The framework makes two methodological contributions: (1) an answer-first generation pipeline where a tool-equipped agent explores source code to produce verified gold answers before deriving questions, ensuring every task is grounded in real code structure; and (2) a three-condition experimental design evaluating agents under closed-book (no repository), code-only (documentation removed), and documented (full repository) conditions, with deltas directly quantifying documentation utility and memorization. We generate 528 code-derivable and 100 doc-dependent tasks across 10 Python repositories from SWE-Bench, scored by an LLM judge on accuracy, completeness, and specificity. Experiments on four frontier models reveal that code access is the dominant factor (+0.23 mean gain over closed-book), documentation provides modest additional benefit (+0.071 on doc-dependent tasks), and code-only $\approx$ documented on code-derivable tasks, validating the design. The framework is open-source and applicable to any well-documented Python repository.

2605.29249 2026-05-29 stat.ML cs.LG

Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research

跨任务预测驱动推理在AI评估与社会科学研究中的应用

Nicolas Emmenegger, Ellery Stahler, Chara Podimata

AI总结 提出多任务预测驱动推理框架,通过跨任务重校准利用共享结构,在标签稀缺时提升统计推断效率,并证明非线性结构是跨任务增益的必要条件。

详情
AI中文摘要

许多应用需要在多个相关任务中进行统计上有效的推断,而每个假设只使用少量高质量标签。在AI评估中,这些任务可能对应于不同提示、子群体或假设下的模型行为;在社会科学调查中,它们可能对应于相关问题、群体或测量条件。预测驱动推理(PPI)利用丰富但廉价的代理测量来改进有限真实标签的推断,但常用方法独立处理任务,因此未能利用相关任务间的共享结构。这一限制在每任务仅有少量标签的场景中尤为重要。为解决此问题,我们引入了一个多任务预测驱动推理框架,该框架利用来自相关任务的标记数据来提高统计功效,同时保留任务特定的推断。我们的方法通过跨任务重校准来利用代理-真实关系中的共享结构,同时保留任务内修正和功效调优,以构建精确的点估计和置信区间。我们证明,只有当代理-真实关系包含非线性结构时,才能实现超越功效调优PPI的效率提升;仿射跨任务重校准在渐近意义上等同于使用原始代理。我们通过合成和半合成数据集上的实验,以及2024年美国总统大选期间审计语言模型关于选举相关信息的案例研究,补充了我们的理论发现。利用一项大型人工标注研究,我们表明当标签稀缺时,跨任务重校准可以显著减少置信区间宽度。

英文摘要

Many applications require statistically valid inference across many related tasks, while using only a handful of high-quality labels per hypothesis. In AI evaluation, these tasks may correspond to model behaviors across prompts, subgroups, or hypotheses; in social science surveys, they may correspond to related questions, populations, or measurement conditions. Prediction-powered inference (PPI) uses abundant but inexpensive proxy measurements to improve inference from limited, ground-truth labels, but commonly used methods treat tasks independently and therefore fail to exploit shared structure across related tasks. This limitation is especially important in settings where only a small number of labels are available per task. To address this issue, we introduce a multi-task prediction-powered inference framework that uses labeled data from related tasks to improve power while preserving task-specific inference. Our methods exploit the shared structure in the proxy-ground-truth relationship through cross-task recalibration, while retaining within-task rectification and power tuning to construct accurate point estimates and confidence intervals. We prove that efficiency gains beyond power-tuned PPI are only possible when the proxy-ground-truth relationship contains nonlinear structure; affine cross-task recalibrations are asymptotically equivalent to using the original proxy. We complement our theoretical findings with experiments on synthetic and semi-synthetic datasets, as well as a case study auditing language models on election-related information during the 2024 U.S. presidential election. Using a large human-annotation study, we show that cross-task recalibration can substantially reduce confidence interval widths when labels are scarce.

2605.29245 2026-05-29 cs.CR cs.CL cs.LG

Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content

LLM的隐式身份技术:跨数据集、模型和生成内容的指纹识别与水印

Bing Liu, Shunping Wang, Yufan Zhu, Xinyi Yu, Jing Huang, Linkang Du, Hongbin Pei, Wei Luo

AI总结 本文综述了LLM指纹识别和水印技术,提出隐式身份统一抽象,并基于生命周期分类法组织数据集、模型和生成内容的技术,建立评估框架。

Comments Accepted by IJCAI-ECAI 2026. 11 pages, 1 figure. Survey and taxonomy of LLM fingerprinting and watermarking for identity, provenance, generated-content attribution, and asset protection

详情
AI中文摘要

本文对LLM指纹识别和水印技术进行了综述和分类,用于身份验证、所有权验证、溯源和生成内容归因。大型语言模型(LLM)需要大量数据、计算和专业知识投入,并越来越多地部署在高风险场景中,因此保护LLM相关资产并追溯其来源至关重要。现有工作已在数据集溯源、模型所有权和生成内容检测方面迅速扩展,但该领域仍然碎片化:指纹识别和水印的使用往往不一致,且方法通常仅在孤立的资产特定设置中研究。为解决这一差距,我们引入隐式身份作为LLM系统中可验证但不可直接观察的身份信号的统一抽象。我们将指纹识别区分为源自内在特征的非侵入式身份,将水印区分为有意嵌入数据、模型或生成内容中的侵入式身份。然后,我们提出一种基于生命周期的分类法,将技术组织到数据集、模型和生成内容中,并进一步通过验证语义进行区分:基于相似性的归因和密钥验证。最后,我们建立一个以可识别性、鲁棒性和可部署性为中心的评估框架,总结在现实访问和变换条件下的代表性指标。通过统一术语、生命周期阶段和评估目标,本综述为研究LLM身份技术以及开发更可靠的资产保护和溯源机制提供了结构化基础。

英文摘要

This paper presents a survey and taxonomy of LLM fingerprinting and watermarking for identity, ownership verification, provenance, and generated-content attribution. Large language models (LLMs) require substantial investments in data, computation, and expertise, and are increasingly deployed in high-stakes settings, making it critical to protect LLM-related assets and trace their origins. Existing work has rapidly expanded across dataset provenance, model ownership, and generated-content detection, but the field remains fragmented: fingerprinting and watermarking are often used inconsistently, and methods are typically studied within isolated asset-specific settings. To address this gap, we introduce implicit identity as a unifying abstraction for verifiable but not directly observable identity signals in LLM systems. We distinguish fingerprinting as non-intrusive identity derived from intrinsic characteristics, and watermarking as intrusive identity deliberately embedded into data, models, or generated content. We then propose a lifecycle-based taxonomy that organises techniques across datasets, models, and generated content, and further separates them by verification semantics: similarity-based attribution and keyed verification. Finally, we establish an evaluation framework centred on identifiability, robustness, and deployability, summarising representative metrics under realistic access and transformation regimes. By unifying terminology, lifecycle stages, and evaluation objectives, this survey provides a structured foundation for studying LLM identity technologies and for developing more reliable mechanisms for asset protection and provenance.

2605.29191 2026-05-29 eess.SY cs.RO cs.SY math.OC

Distributed Non-Uniform Scaling Control of Multi-Agent Formation with Dynamic Agent Joining

具有动态加入智能体的多智能体编队分布式非均匀缩放控制

Tao He, Gangshan Jing

AI总结 针对动态加入智能体的多智能体编队,提出一种分布式非均匀缩放控制框架,通过保持图拉普拉斯矩阵的谱特性实现任意维度下的编队形状调整。

Comments This paper has been accepted by IFAC 2026

详情
AI中文摘要

编队的非均匀缩放控制使多智能体系统能够通过沿不同坐标轴以不同比例缩放来调整其形状,在复杂环境中提供增强的灵活性。然而,与大多数现有的编队机动策略一样,它通常假设一组固定的智能体,限制了其在需要动态团队扩展的场景中的适用性。本文介绍了一种分布式控制框架,该框架能够在任意维度的非均匀缩放机动过程中将新智能体纳入编队,同时保持图拉普拉斯矩阵的谱特性。仿真示例验证了理论结果的有效性。

英文摘要

Non-uniform scaling control of formation enables multi-agent systems to adjust their shape by scaling with different ratios along different coordinate axes, offering enhanced flexibility in complex environments. However, like most existing formation maneuver strategies, it typically assumes a fixed set of agents, limiting its applicability in scenarios requiring dynamic team expansion. This paper introduces a distributed control framework that enables a formation to incorporate new agents during non-uniform scaling maneuvers in arbitrary dimensions while preserving the spectral properties of the graph Laplacian. Simulation examples validate the effectiveness of the theoretical results.

2605.29141 2026-05-29 cs.IR cs.AI

Toward User Preference Alignment in LLM Recommendation via Explicit Context Feedback

通过显式上下文反馈实现LLM推荐中的用户偏好对齐

Weizhi Zhang, Wooseong Yang, Yuxin Cui, Zhaohui Guo, Hins Hu, Liangwei Yang, Henry Peng Zou, Qifei Wang, Hanqing Zeng, Jiayi Liu, Yinglong Xia, Philip S. Yu

AI总结 本文主张在基于大语言模型的推荐系统中优先利用显式上下文反馈(如评论文本)来对齐用户偏好,提升推荐的个性化和可解释性。

Comments Published in CogMI 2025. https://ieeexplore.ieee.org/abstract/document/11417068

详情
AI中文摘要

传统推荐系统主要从隐式信号(如点击、观看和购买)推断用户偏好,往往忽略了用户通过评论文本等言语形式提供的丰富显式上下文反馈。这种显式上下文反馈捕捉了用户决策背后关于偏好的细微原因,并为用户偏好对齐和更可解释的推荐提供了关键的异构信息。忽视这些信号可能导致用户偏好错位,并进一步强化信息茧房,因为算法无法理解用户选择背后的“语义上下文”。大语言模型的最新进展为利用用户生成内容实现更准确和多样化的推荐提供了新机遇,但当前基于大语言模型的推荐仍主要关注项目元数据,未能充分利用这一资源。本文主张在下一代基于大语言模型的推荐系统中优先考虑显式上下文反馈。我们回顾了推荐范式的演变,强调了富含上下文的反馈的价值,呼吁建立新的基准和指标,并介绍了将显式用户信号集成到可扩展的基于大语言模型的推荐系统中的框架。以用户偏好建模为中心,我们旨在促进在线推荐平台更加个性化、透明和可解释。

英文摘要

Traditional recommender systems (RecSys) primarily infer user preferences from implicit signals (such as clicks, watches, and purchases), often neglecting the rich explicit contextual feedback users provide through verbal text, like comments and reviews. This explicit context feedback captures the nuanced reasons behind user decisions regarding their preferences. In addition, it offers critical heterogeneous information for user preference alignment and more explainable recommendations. Overlooking such signals can lead to misaligned user preferences and further reinforce filter bubbles, as algorithms fail to understand the "semantic context" behind user choices. Recent advances in Large Language Models (LLMs) present new opportunities to harness user-generated content for more accurate and diverse recommendations, yet current LLM-based recommendations still focus on using item meta-data and underutilize this resource. In this paper, we advocate for prioritizing explicit context feedback in the next generation of LLM-based RecSys. We review the evolution of recommendation paradigms, highlight the value of context-rich feedback, call for new benchmarks and metrics, and introduce frameworks for integrating explicit user signals into scalable LLM-driven RecSys. Centering on user-preference modeling, we aim to foster more personalized, transparent, and explainable RecSys online platforms.

2605.29139 2026-05-29 stat.ML cs.LG

Anytime-Valid Federated Conformal RAG for LLM Swarms

面向LLM群体的任意有效联邦共形RAG

Prasanjit Dubey, Xiaoming Huo

AI总结 提出Anytime-FC-RAG,通过可累积的逐步校准偏差预算和截断投注e过程,将联邦共形RAG扩展到任意停止时间均有效的序贯覆盖,并保证时间均匀报警有效性、Hoeffding拼接累积误覆盖包络及自适应控制下的安全性。

详情
AI中文摘要

联邦共形RAG(FC-RAG)为带宽受限的弱语言模型群体提供了无分布假设的覆盖保证,但仅限于固定时间范围。我们将其扩展到任意有效序贯覆盖:在每个停止时间均有效,且在可预测自适应控制(重新校准、每节点带宽升级、蒸馏学生刷新)下保持不变,且无需比固定时间范围FC-RAG更多的假设。朴素组合失败,因为FC-RAG的边缘覆盖界使得投注e过程在不利校准抽取下成为非超鞅,无法调用Ville不等式。我们提出Anytime-FC-RAG,这是一种序贯扩展,基于可累加的逐步校准偏差预算,将边缘界转换为校准好事件上的严格条件界,并配以在整个概率空间上为非负超鞅的截断投注e过程。由这两个要素,我们获得四个保证:时间均匀报警有效性$\mathbb{P}(\sup_t E_t \ge 1/δ_e) \le δ_e + δ_{\mathrm{cal}}$,相同总预算下的Hoeffding拼接累积误覆盖包络,任何可预测控制器(重新校准、带宽升级、学生刷新)下的安全性,以及通过可累加训练预算在无界序列的联邦探针-逻辑蒸馏(FPLD)刷新上的训练侧误差传播。实际结果是,仅在e过程超过警告阈值时升级检索带宽的自适应控制器,以显著更低的通信成本匹配固定高带宽调度的报警率。在GPT-2-small + MiniLM群体上对MMLU、DBpedia和AG News的实验验证了预测的报警率、检测延迟、包络覆盖以及14%-57%的带宽节省;报警仅在覆盖真正失效时触发。

英文摘要

Federated Conformal RAG (FC-RAG) provides distribution-free coverage for a bandwidth-limited swarm of weak language models, but only at a fixed horizon. We extend it to anytime-valid sequential coverage: validity at every stopping time, preserved under predictable adaptive control (recalibration, per-node bandwidth escalation, distilled-student refresh), at no extra cost in assumptions over fixed-horizon FC-RAG. Naive composition fails because FC-RAG's marginal coverage bound makes the betting e-process a non-supermartingale on adverse calibration draws, and Ville's inequality cannot be invoked. We give Anytime-FC-RAG, a sequential extension built on a summable per-step calibration-deviation budget that converts the marginal bound into a strict conditional bound on a calibration-good event, paired with a truncated betting e-process that is a nonnegative supermartingale on the entire probability space. From these two ingredients, we obtain four guarantees: time-uniform alarm validity $\mathbb{P}(\sup_t E_t \ge 1/δ_e) \le δ_e + δ_{\mathrm{cal}}$, a Hoeffding-stitched cumulative-miscoverage envelope at the same total budget, safety under any predictable controller (recalibration, bandwidth escalation, student refresh), and training-side error propagation across an unbounded sequence of Federated Probe-Logit Distillation (FPLD) refreshes via a summable training budget. As a practical consequence, an adaptive controller that escalates retrieval bandwidth only when the e-process crosses a warning threshold matches the alarm rate of a fixed-high-bandwidth schedule at substantially lower communication cost. Experiments on a GPT-2-small + MiniLM swarm across MMLU, DBpedia, and AG News verify the predicted alarm rate, detection delay, envelope coverage, and $14$-$57\%$ bandwidth savings; the alarm fires when and only when coverage genuinely breaks.

2605.29121 2026-05-29 math.DS cs.AI cs.LG

A Minimal Bifurcation Model of Load Imbalance in a Softmax Mixture-of-Experts Router

Softmax混合专家路由器中负载不平衡的最小分岔模型

O. M. Kiselev

AI总结 提出一个两专家混合专家层的自适应softmax路由最小动力学模型,通过平均场极限从离散强化规则导出,发现超临界叉形分岔导致负载不平衡,并推导了分岔集和尖点灾变的精确参数方程。

Comments 21 pages, 11 figures

详情
AI中文摘要

我们提出了一个两专家混合专家(MoE)层的自适应softmax路由的最小动力学模型。该模型作为离散强化规则的平均场极限得到:被选中的专家获得小的分数增量,而所有分数经历正则化衰减。在对称情况下,极限系统具有超临界叉形分岔:对于弱反馈,存在唯一的稳定平衡状态,而当反馈强度超过临界值时,出现两个稳定的不对称状态。当加入外部不对称性时,叉形分岔展开为一对折叠分岔,在控制参数平面中形成一个尖点。我们推导了分岔集和尖点灾变的局部规范型的精确参数方程。数值实验将这一图景与经验专家负载、一个小的可训练MoE模型、硬top-1 PyTorch路由以及一个关于数字的小型分类实验联系起来。结果为自适应MoE路由器中负载不平衡的突然转变提供了一个可控的低维机制。

英文摘要

We propose a minimal dynamical model of adaptive softmax routing for a two-expert Mixture-of-Experts (MoE) layer. The model is obtained as a mean-field limit of a discrete reinforcement rule: the selected expert receives a small score increment, while all scores undergo regularizing decay. In the symmetric case the limiting system has a supercritical pitchfork bifurcation: for weak feedback there is a unique stable balanced state, whereas above a critical feedback strength two stable asymmetric states appear. When an external asymmetry is added, the pitchfork unfolds into a pair of fold bifurcations forming a cusp in the control-parameter plane. We derive exact parametric equations for the bifurcation set and the local normal form of the cusp catastrophe. Numerical experiments connect this picture to empirical expert load, a small trainable MoE model, hard top-1 PyTorch routing, and a small classification experiment on digits. The results provide a controlled low-dimensional mechanism for abrupt transitions to load imbalance in adaptive MoE routers.

2605.29115 2026-05-29 cs.CR cs.AI

unix-ctf: Procedural Environments for Unix-Competence Reinforcement Learning

unix-ctf: 用于Unix能力强化学习的过程化环境

Geoffrey Bradway, Roger Creus Castanyer, Lorenz Wolf, Maxwill Lin, Matthew James Sargent, Augustine N. Mavor-Parker

AI总结 本文提出unix-ctf,一个过程化生成shell代理的夺旗任务的环境,通过LLM辅助合成管道生成可复用的隐藏-查找脚本对,并基于此微调Qwen3-8B模型,将解决率从11.6%提升至43.6%,证明Unix能力是可分离、可训练的。

详情
AI中文摘要

Unix能力是指将shell和操作系统原语作为一等工具使用的能力,而不仅仅是通过终端编写程序。当前的终端基准测试往往模糊了这一区别:一个精通Python但Unix能力薄弱的求解器可以通过Terminal-Bench 2.0的相当一部分,而反向技能组合则很少被锻炼。我们使这一区别可操作化,并为Unix组件构建训练表面。unix-ctf是一个为shell代理过程化生成夺旗任务的工具。每个任务使用单个Unix特性在一个新的Linux容器中隐藏一个短令牌(形如flag(a3b1c9...)的旗帜),代理必须恢复它。任务由LLM辅助的合成管道生成,该管道生成候选隐藏技术,将其重写为参数化的隐藏-查找脚本对,并通过双向契约进行过滤:隐藏脚本不得在磁盘上留下旗帜的明文痕迹,查找脚本必须在新目录中恢复旗帜。由于LLM仅编写植入和恢复步骤(容器、布局和评分框架是固定的),该管道在750次原始尝试中获得了656个可移植、可复用的变体(87.5%)。我们复现Endless Terminals的完整容器生成方法,在相同检查下仅获得17.4%。656个变体规范化为155种不同技术。使用GRPO在此表面上通过LoRA微调Qwen3-8B,将15技能多族保留集(n=225)上的解决率从11.6%提升至43.6%,重新分配了模型解决的InterCode-CTF任务,并在Forensics上获得+33个百分点的提升,同时在InterCode-CTF上达到32/100。这些结果表明,Unix能力是可分离、可训练的,最好直接评估,而不是将其融入通过终端的编程中。

英文摘要

Unix competence is the ability to use shell and operating-system primitives as first-class tools, not merely to write programs through a terminal. Current terminal benchmarks tend to blur this distinction: a solver fluent in Python but weak in Unix can pass a substantial fraction of Terminal-Bench 2.0, while the reverse skill profile is rarely exercised. We make the distinction operational and build a training surface for the Unix component. unix-ctf is a procedural generator of capture-the-flag tasks for shell agents. Each task hides a short token (a flag of the form flag(a3b1c9...)) inside a fresh Linux container using a single Unix feature, and the agent must recover it. Tasks are produced by an LLM-assisted synthesis pipeline that generates candidate hiding techniques, rewrites them into parameterized hide-and-find script pairs, and filters them with a bidirectional contract: the hide script must leave no plaintext trace of the flag on disk, and the find script must recover the flag in a fresh directory. Because the LLM only writes the planting and recovery steps (the container, layout, and grading harness are fixed), the pipeline lands 656 of 750 raw attempts as portable, reusable variants (87.5\%). Our reproduction of Endless Terminals' full-container-generation approach lands only 17.4\% under the same checks. The 656 variants canonicalize to 155 distinct techniques. Fine-tuning Qwen3-8B with LoRA using GRPO on this surface lifts solve rate from 11.6\% to 43.6\% on a 15-skill multi-family holdout (n=225), redistributes which InterCode-CTF tasks the model solves, and produces a +33 pp gain in Forensics while reaching 32/100 on InterCode-CTF. These results suggest that Unix competence is separable, trainable, and best evaluated directly rather than folded into programming-through-a-shell.

2605.29114 2026-05-29 cs.CR cs.LG cs.RO

ReasonBreak: Probing Vulnerabilities in Reasoning-Enabled Vision-Language-Action Models for Autonomous Driving

ReasonBreak: 探测自动驾驶中具备推理能力的视觉-语言-行动模型的脆弱性

Mohammadreza Teymoorianfard, Jean-Philippe Monteuuis, Jonathan Petit, Amir Houmansadr

AI总结 本文通过黑盒攻击方法,首次系统研究了具备推理能力的视觉-语言-行动模型在自动驾驶中面对真实输入扰动时的脆弱性,发现其推理和轨迹生成均易受攻击,导致碰撞率上升。

详情
AI中文摘要

具备集成推理能力的视觉-语言-行动(VLA)模型已被提出用于端到端自动驾驶,假设推理与轨迹生成之间存在紧密耦合。然而,此类系统在真实输入扰动下的鲁棒性尚未得到充分探索。我们表明,这些模型对真实输入扰动高度脆弱,在闭环仿真中推理攻击成功率高达89%,轨迹操控攻击成功率高达72%,导致碰撞率上升和安全指标下降。以NVIDIA近期开发的Alpamayo模型为代表,我们首次对具备推理能力的VLA模型在真实文本输入损坏下进行了系统性黑盒研究,评估了其对推理和驾驶行为的影响。我们引入了一个推理感知评估框架,捕捉推理的语义和结构方面,并结合以安全为中心的度量。我们还引入了一个基准,用于评估自动驾驶中推理-轨迹交互的攻击与防御。我们的结果强调了严格评估和改进防御的必要性,以确保自动驾驶中具备推理能力的VLA系统的安全性。

英文摘要

Vision-Language-Action (VLA) models with integrated reasoning have been proposed for end-to-end autonomous driving, assuming a tight coupling between reasoning and trajectory generation. However, the robustness of such systems under realistic input perturbations remains largely unexplored. We show that these models are highly vulnerable to realistic input perturbations, achieving up to 89% attack success rate (ASR) on reasoning and up to 72% on trajectory manipulation in closed-loop simulation, leading to increased collision rates and degraded safety metrics. Using NVIDIA's recent Alpamayo models as representative industry-developed VLAs, we conduct the first systematic black-box study of reasoning-enabled VLA models under realistic textual input corruptions, evaluating their impact on reasoning and driving behavior. We introduce a reasoning-aware evaluation framework capturing both semantic and structural aspects of reasoning, along with safety-centric measures. We also introduce a benchmark for evaluating attacks and defenses on reasoning-trajectory interactions in autonomous driving. Our results highlight the need for rigorous evaluation and improved defenses to ensure the safety of reasoning-enabled VLA systems in autonomous driving.

2605.29063 2026-05-29 eess.IV cs.CV

Accelerating HEVC Intra Partitioning via a CNN-Hierarchical Attention Transformer Hybrid

通过CNN-分层注意力Transformer混合加速HEVC帧内划分

Krishna Kumar Sharma, Somdyuti Paul

AI总结 提出HFViT混合架构,融合重参数化深度可分离卷积与分层注意力Transformer,以低复杂度实现高效全局信息传播,在HEVC帧内划分预测中降低VMAF BD-rate惩罚并保持低CPU延迟。

详情
AI中文摘要

高效视频编码(HEVC)中的递归四叉树划分带来了大量计算开销,其中针对CTU划分预测的穷举率失真优化消耗了编码时间的主要部分。尽管通过深度学习进行划分预测已成为一种可行的编码加速器,但架构上的二分法仍未得到充分解决:CNN计算效率高,但由于其局部有效感受野而空间短视,无法捕捉长程语义关系和重复纹理;相反,基于Transformer的架构更擅长捕捉全局上下文,但会带来过高的CPU延迟,这是阻碍其在主要CPU受限环境中部署的关键缺陷。本文介绍了混合快速视觉Transformer(HFViT),这是一种旨在加速HEVC帧内模式划分预测的混合架构。HFViT将重参数化的深度可分离卷积骨干与分层注意力Transformer(HAT)机制融合,利用载体令牌方案以次二次复杂度实现高效的全局信息传播。训练后的结构融合将批归一化折叠到前一层,以进一步减少延迟。全面评估揭示了HFViT在跨分辨率加速HEVC帧内编码方面的有效性。在标准JCT-VC测试序列上,与竞争的ETH-CNN基线相比,HFViT在A、B和E类上分别将平均VMAF BD-rate惩罚降低了2.4、2.6和7.9个百分点,同时将CPU推理延迟维持在CNN基线的8%以内,并在GPU上超越其40%,为实时编码器集成建立了实际可行性。

英文摘要

The recursive quad-tree partitioning in High Efficiency Video Coding (HEVC) incurs considerable computational overhead, with exhaustive rate-distortion optimization for CTU partition prediction consuming the dominant share of encoding time. Although partition prediction through deep learning has emerged as a viable encoding accelerator, an architectural dichotomy remains largely unaddressed: CNNs are computationally efficient but spatially myopic due to their localized effective receptive fields, failing to capture long range semantic relationships and repetitive textures; conversely, transformer based architectures are better at capturing global context but incur prohibitive CPU latency, a critical liability that impedes deployment which is predominantly CPU-bound. This paper introduces Hybrid Fast Vision Transformer (HFViT), a hybrid architecture designed to accelerate HEVC intra-mode partition prediction. HFViT fuses a reparameterized depthwise-separable convolutional backbone with a Hierarchical Attention Transformer (HAT) mechanism, leveraging a carrier token scheme to enable efficient global information propagation at sub-quadratic complexity. Post-training structural fusion collapses batch normalization into preceding layers to further reduce latency. Comprehensive evaluation reveals the efficacy of HFViT in accelerating HEVC intra-encoding across resolutions. On standard JCT-VC test sequences, HFViT reduces the average VMAF BD-rate penalty by 2.4, 2.6, and 7.9 percentage points on Classes A, B and E, respectively, as compared to the competing ETH-CNN baseline while maintaining CPU inference latency within 8% of the CNN baseline and surpassing it on GPU by 40%, establishing practical viability for real-time encoder integration.

2605.29059 2026-05-29 cs.SE cs.AI cs.CR

SCDBench: A Benchmark for LLM-Based Smart Contract Decompilers

SCDBench: 基于大语言模型的智能合约反编译基准

Kaihua Qin, Dawn Song, Arthur Gervais

AI总结 针对现有智能合约反编译评估缺乏统一基准的问题,提出SCDBench数据集与评估方法,通过四阶段累积评估(格式完整性、可编译性、ABI恢复、语义一致性)测试前沿LLM的反编译能力,发现语义一致性仍远未解决。

详情
AI中文摘要

智能合约反编译旨在从字节码恢复高级源代码,但评估反编译器仍然困难,因为现有研究使用狭窄的数据集、不一致的度量标准和有限的语义一致性检查。随着大语言模型(LLMs)开始生成类似源代码的Solidity代码,这些代码可能编译通过并看似合理,即使其语义与原始合约存在偏差,这一差距变得日益重要。我们引入了SCDBench,一个用于基于LLM的智能合约反编译的数据集和基准方法。该数据集包含600个真实世界的Solidity合约,配有配对的字节码输入、真实源代码和可重放的语义检查点。SCDBench通过四个累积阶段评估反编译器的输出:格式完整性、可编译性、应用程序二进制接口(ABI)恢复以及通过差分重放实现的语义一致性。我们在零样本反编译设置中评估了Claude Opus 4.7、GPT-5.3-Codex和GLM-5,包括具有和不具有扩展推理的GLM-5变体,以及零样本编译修复设置。结果表明,前沿LLM通常能够生成结构化和可编译的Solidity代码,但实现语义一致性仍远未解决:表现最好的前沿模型仅完美反编译了42/600个合约。我们进一步表明,引入同模型编译修复在适度增加成本的情况下显著提升了性能。SCDBench为严格、可重复的评估建立了共同基础,旨在加速开发用于区块链安全性和透明度的可靠智能合约反编译器。

英文摘要

Smart contract decompilation aims to recover high-level source code from bytecode, but evaluating decompilers remains difficult because existing studies use narrow datasets, inconsistent metrics, and limited semantic consistency checks. This gap is increasingly important as large language models (LLMs) begin to generate source-like Solidity that may compile and appear plausible, even when its semantics diverge from the original contract. We introduce SCDBench, a dataset and benchmark methodology for LLM-based smart contract decompilation. The dataset contains 600 real-world Solidity contracts with paired bytecode inputs, ground-truth source code, and replayable semantic checkpoints. SCDBench evaluates decompiler outputs through four cumulative stages: format completeness, compilability, Application Binary Interface (ABI) recovery, and semantic consistency via differential replay. We evaluate Claude Opus 4.7, GPT-5.3-Codex, and GLM-5 in a zero-shot decompilation setting, including GLM-5 variants with and without extended reasoning and a zero-shot compilation-repair setting. The results show that frontier LLMs can often produce structured and compilable Solidity, but achieving semantic consistency remains far from solved: the best-performing frontier model perfectly decompiles only 42/600 contracts. We further show that introducing same-model compilation repair substantially improves performance at modest additional cost. SCDBench establishes a common ground for rigorous, reproducible evaluation and aims to accelerate the development of reliable smart contract decompilers for blockchain security and transparency.

2605.29016 2026-05-29 astro-ph.IM astro-ph.CO cs.LG

Three-dimensional Conditional Diffusion Models for Cosmological 21 cm Lightcone Emulation

用于宇宙学21厘米光锥模拟的三维条件扩散模型

Bin Xia, John H. Wise

AI总结 针对三维21厘米光锥模拟的困难,通过对比预处理、动态范围压缩、架构深度和训练时长等配置,发现Yeo-Johnson预处理结合中等幅度压缩在全局信号的标准化平均绝对误差上表现最优,但视觉上合理的样本仍存在统计偏差。

详情
AI中文摘要

我们研究了用于三维21厘米光锥模拟的条件扩散模型,重点关注天空平面大小为$64\times64$、视线深度达1024个像素的立方体。与早期的二维研究相比,三维设置更加困难,因为内存限制导致微批次非常小,而底层体素分布高度偏斜且长尾。我们通过使用$25{,}600$个训练光锥和固定参数点的验证集成,对预处理选择、动态范围压缩设置、架构深度和训练时长进行了控制比较。在验证中,每个参考参数点包含800个具有独立初始条件的21cmFAST实现,并且每个模型和每个参考集使用800个样本进行报告的集成比较。我们通过图像和摘要统计空间中的互补诊断评估生成的光锥:亮温度切片、全局信号、功率谱和简化散射系数。在测试的配置中,预处理是控制稳定训练和最终物理保真度的主导因素。在此探索的配置中,Yeo-Johnson预处理结合中等幅度压缩给出了最一致的有利权衡,最强的定量支持来自基于全局信号的标准差归一化平均绝对误差($\mathrm{MAE}_{\rm std}$)的排名,并且在互补诊断中表现出定性一致的行为。同时,视觉上合理的三维样本在两点和高阶统计中仍然保留可测量的偏差。因此,我们将当前工作视为三维21厘米模拟以及未来纳入更真实观测效应的研究的一个模拟级基线。

英文摘要

We investigate conditional diffusion modeling for three-dimensional 21 cm lightcone emulation, focusing on cubes with a sky-plane size of $64\times64$ and a line-of-sight depth up to 1024 cells. Relative to earlier 2D studies, the 3D setting is substantially harder because memory limits enforce very small micro-batches while the underlying voxel distribution is highly skewed and long tailed. We perform controlled comparisons across preprocessing choices, dynamic-range compression settings, architecture depth, and training duration using $25{,}600$ training lightcones and validation ensembles at fixed parameter points. For validation, each reference parameter point contains 800 21cmFAST realizations with independent initial conditions, and we use 800 samples per model and per reference set for the reported ensemble comparisons. We evaluate generated lightcones with complementary diagnostics in both image and summary-statistic spaces: brightness-temperature slices, the global signal, the power spectrum, and reduced scattering coefficients. Across the tested configurations, preprocessing is the dominant factor governing stable training and the resulting physical fidelity. Among the configurations explored here, Yeo-Johnson preprocessing combined with moderate amplitude compression gives the most consistently favorable trade-off, with the strongest quantitative support coming from rankings based on the standard-deviation-normalized mean absolute error ($\mathrm{MAE}_{\rm std}$) of the global signal and qualitatively compatible behavior in the complementary diagnostics. At the same time, visually plausible 3D samples still retain measurable biases in two-point and higher-order statistics. We therefore view the present work as a simulation-level baseline for three-dimensional 21 cm emulation and for future studies that incorporate more realistic observational effects.

2605.28999 2026-05-29 cs.CR cs.AI cs.CL cs.LG

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

测量基于LLM的简历筛选中真实世界的提示注入攻击

Mohan Zhang, Yuqi Jia, Zhen Tan, Steven Jiang, Neil Zhenqiang Gong, Tianlong Chen, Dawn Song

AI总结 本研究首次系统性地分析了基于LLM的简历筛选应用中的提示注入攻击,通过设计专用检测器对约20万份真实简历进行测量,发现约1%的简历包含隐藏的提示注入,且近年来其流行度显著增加。

Comments Published in USENIX Security Symposium 2026; Code and artifacts are available at https://github.com/UNITES-Lab/resume-injection-measurement

详情
AI中文摘要

LLM容易受到提示注入攻击。然而,这种漏洞主要是在学术研究中通过概念性演示或少数轶事案例研究来展示的。其在真实世界基于LLM的应用中的普遍性和影响尚未得到充分探索。在这项工作中,我们首次对广泛使用的应用——基于LLM的简历筛选——中的提示注入攻击进行了系统研究。我们的分析基于hireEZ多年来收集的约20万份真实简历。我们首先设计了专门的方法来检测简历中的提示注入。在小规模数据集上的手动验证表明,我们的检测器实现了高精度,并优于最先进的通用检测器。然后,我们将检测器应用于完整的简历数据集,并对真实世界的提示注入攻击进行了全面的测量研究。我们的分析揭示了一些有趣的发现:大约1%的简历包含隐藏的提示注入;这种注入简历的流行度在过去一到两年内显著增加;超过90%的注入提示不使用显式指令。这些结果首次提供了真实世界基于LLM的应用中大规模提示注入的证据,并为未来理解和缓解此类攻击的研究奠定了基础。

英文摘要

LLMs are vulnerable to prompt injection attacks. However, this vulnerability has been primarily demonstrated conceptually in academic studies or through a few anecdotal case studies. Its prevalence and impact in real-world LLM-based applications are largely unexplored. In this work, we present the first systematic study of prompt-injection attacks in a widely used application: LLM-based resume screening. Our analysis is based on approximately 200K real-world resumes collected over multiple years by hireEZ. We first design tailored methods to detect prompt injection in resumes. Manual validation on a small-scale dataset demonstrates that our detectors achieve high precision and outperform state-of-the-art general-purpose detectors. We then apply our detector to the full resume dataset and conduct a comprehensive measurement study of real-world prompt injection attacks. Our analysis reveals several intriguing findings: approximately 1% of resumes contain hidden prompt injections; the prevalence of such injected resumes has increased noticeably over the past one to two years; and more than 90% of injected prompts do not use explicit instructions. These results provide the first evidence of large-scale prompt injection in real-world LLM-based applications and lay the groundwork for future studies to understand and mitigate such attacks.

2605.28980 2026-05-29 math.OC cs.LG cs.NA eess.SP math.NA

Manifold-based Algorithms for the Hadamard Decomposition

基于流形的Hadamard分解算法

Nicolas Gillis, Subhayan Saha, Stefano Sicilia, Arnaud Vandaele

AI总结 针对Hadamard分解问题,提出三种基于流形的新算法(包括Manopt、块投影梯度和无投影流形梯度下降),并设计新的初始化策略,在合成和真实数据上优于现有方法。

Comments 27 pages, code available from https://github.com/StefanoSicilia/Hadamard-Decomposition

详情
AI中文摘要

给定矩阵 $X$ 和两个秩 $r_1$ 和 $r_2$,Hadamard分解(HD)寻找两个低秩矩阵 $X_1$(秩 $r_1$)和 $X_2$(秩 $r_2$),它们与 $X$ 大小相同,使得 $X\approx X_1\circ X_2$,其中 $\circ$ 是Hadamard(逐元素)乘积。大多数情况下,HD比标准低秩近似(如截断奇异值分解(TSVD))更具表现力,因为它可以用相同数量的参数表示更高秩的矩阵;这是因为 $X_1 \circ X_2$ 的秩通常等于 $r_1 r_2$。本文首先给出HD的一些理论见解,特别是一个有用的重写形式 $X\approx WH^\top$,其中 $W$ 和 $H$ 有 $r_1 r_2$ 列并属于某些流形。这使我们能够开发三种计算HD的新算法。第一种使用表示 $X\approx X_1\circ X_2$ 并依赖于Manopt工具箱。另外两种依赖于重写形式 $X\approx WH^\top$:一种是块投影梯度方法,另一种是基于流形的梯度下降算法,不需要投影到可行集。最后两种算法特别适用于处理大规模稀疏数据。我们还提出了新的初始化策略,以提高HD的精度。我们将我们的算法和初始化策略与TSVD及现有技术进行了比较。数值结果表明,新方法在合成和真实数据上高效且具有竞争力。

英文摘要

Given a matrix $X$, and two ranks $r_1$ and $r_2$, the Hadamard decomposition (HD) looks for two low-rank matrices, $X_1$ of rank $r_1$ and $X_2$ of rank $r_2$, both of the same size as $X$, such that $X\approx X_1\circ X_2$, where $\circ$ is the Hadamard (element-wise) product. In most cases, HD is more expressive than standard low-rank approximations such as the truncated singular value decomposition (TSVD), as it can represent higher-rank matrices with the same number of parameters; this is because the rank of $X_1 \circ X_2$ is generically equal to $r_1 r_2$. In this paper, we first present some theoretical insights for HD, in particular a useful reformulation $X\approx WH^\top$ where $W$ and $H$ have $r_1 r_2$ columns and belong to certain manifolds. These allow us to develop three new algorithms for computing HD. The first one uses the representation $X\approx X_1\circ X_2$ and relies on the Manopt toolbox. The other two rely on the reformulation $X\approx WH^\top$: one is a block projected gradient method, and the other is a manifold-based gradient descent algorithm that does not require projection onto the feasible set. The last two algorithms are particularly effective for handling large sparse data. We also propose new initializations that allow us to improve the accuracy of the HD. We compare our algorithms and initialization strategies with the TSVD and with the state of the art. Numerical results show that the new methods are efficient and competitive on both synthetic and real data.

2605.28961 2026-05-29 stat.ML cs.LG math.OC

Dynamics of Stochastic Momentum with Sparse Updates in High Dimensions

高维稀疏更新下随机动量的动力学

Katie Everett, Elliot Paquette

AI总结 本文通过最小二乘和逻辑回归模型,理论分析了稀疏更新下动量的动力学,揭示了由动量保留时间尺度与学习时间尺度之比决定的相结构,并发现不同令牌稀疏度下的振荡动力学存在谱冲突。

详情
AI中文摘要

现有的动量理论假设梯度以大致恒定的速率到达每个参数,但这一假设在重尾数据分布和现代架构中常被违反。我们理论分析了稀疏更新下两种可处理动量模型的动力学:具有稀疏输入的最小二乘模型和具有稀有类别的逻辑回归模型。两者都给出了精确的闭式二阶矩动力学,我们针对稀疏性、批量大小和动量衰减的三个标度指数刻画了其高维极限。两个问题上的相结构由两个内在时间尺度之比决定:动量保留时间尺度(缓冲区存活的活动更新次数)和学习时间尺度(减少平方误差所需的活动更新次数)。当学习远慢于保留时,极限匹配SGD;当学习更快时,系统不稳定;当时间尺度相当时,我们恢复经典的重球动力学。振荡动力学发生在不同令牌稀疏度的不同动量值处,从而在全局动量上产生跨令牌频率的谱冲突。

英文摘要

Existing theory of momentum assumes that gradients arrive at every parameter at a roughly constant rate, an assumption violated in practice by heavy-tailed data distributions and modern architectures. We theoretically analyze the dynamics of two tractable models of momentum under sparse updates: a least squares model with sparse inputs and a logistic regression model with a rare class. Both admit exact closed-form second-moment dynamics whose high-dimensional limits we characterize across three scaling exponents for sparsity, batch size, and momentum decay. The phase structure on both problems is governed by the ratio of two intrinsic timescales: a momentum retention timescale (how many active updates the buffer survives) and a learning timescale (how many active updates it takes to reduce the squared error). When learning is much slower than retention, the limit matches SGD; when learning is faster, the system is unstable; where the timescales coincide, we recover classical heavy-ball dynamics. The oscillatory dynamics occur at different momentum values for different token sparsity, creating a spectral conflict for global momentum across token frequencies.

2605.28940 2026-05-29 hep-ph cs.LG hep-ex physics.data-an

Neural Scaling Laws for Jet Generation

喷注生成的神经缩放定律

Oz Amram, Darius A. Faroughy, Tjarko Gerdes, Anna Hallin, Gregor Kasieczka, Michael Krämer, Humberto Reyes-Gonzalez, David Shih

AI总结 本文首次探索粒子喷注生成任务中的缩放定律,发现模型大小缩放遵循对数定律,并证明下一个标记预测验证损失与物理性能单调相关。

详情
AI中文摘要

最近观察到的经验缩放定律描述了基础模型在三个独立关键量(数据集大小、计算量和模型参数)变化时的性能。提取这些缩放定律有助于训练大型复杂模型,因为传统方式调优超参数不可行。本文首次探索缩放定律是否也适用于粒子喷注生成任务——该任务既作为基础模型的预训练目标,也作为原位模拟本身。我们确实复制了模型大小缩放的关键对数缩放定律行为。除了研究生成模型的下一个标记预测验证损失,我们还研究了五个物理量的切片Wasserstein距离,这些物理量在训练期间模型无法直接获得。我们的研究表明,该量与下一个标记预测验证损失单调相关,意味着该损失确实是物理性能的良好代理。对于数据集大小和计算量的缩放,我们观察到损失和切片Wasserstein距离的缩放行为明显较弱。我们通过引入可学习窗口的概念分析这种行为,并认为喷注成分的自回归下一个标记预测相对于语言模型研究表现出较快的饱和。我们讨论了这种行为的可能起源,包括QCD辐射的随机性以及生成式与监督式学习任务在碰撞物理中的差异。

英文摘要

Recently observed empirical scaling laws describe the performance of foundation-type models as three independent key quantities -- dataset size, compute, and model parameters -- are modified. Extracting these scaling laws informs the training of large complex models for which the tuning of hyperparameters in traditional ways is not feasible. This work for the first time explores if scaling laws can also be observed for the task of particle jet generation -- both relevant as a pre-training objective for foundation models and as in-situ simulation by itself. We indeed replicate the key logarithmic scaling law behavior for model-size scaling. Beyond studying the next token prediction validation loss of the generative model, we also study the sliced Wasserstein distance of five physical quantities that are not immediately available to the model during training. Our study shows that this quantity is monotonically related to the next token prediction validation loss, meaning that this loss is indeed a good proxy for the physics performance. For the scaling with dataset size and compute, we observe substantially weaker scaling behavior of both the loss and the sliced Wasserstein distance. We analyze this behavior by introducing the concept of a learnable window, and argue that autoregressive next token prediction on jet constituents exhibits comparatively rapid saturation relative to language-model studies. We discuss possible origins of this behavior, including the stochastic nature of QCD radiation and differences between generative and supervised learning tasks in collider physics.

2605.28914 2026-05-29 cs.CR cs.AI

AIRGuard: Guarding Agent Actions with Runtime Authority Control

AIRGuard:通过运行时权限控制守护智能体行为

Suliu Qin, Haomin Zhuang, Yujun Zhou, Yufei Han, Xiangliang Zhang

AI总结 针对工具使用语言智能体面临的权限混淆问题,提出运行时守卫AIRGuard,通过动作时授权实现最小权限原则,显著降低攻击成功率并保持良好良性效用。

详情
AI中文摘要

使用工具的语言智能体将模型决策转化为外部副作用:它们读取文件、运行脚本、调用API、发送消息以及调用模型上下文协议工具。这使得针对智能体的攻击不同于越狱攻击。有害步骤往往不是明显禁止的输出,而是普通的可执行动作,但由于攻击者控制的上下文将授权访问导向违背用户利益的方向而变得不安全。我们将这种失败模式识别为权限混淆:不可信资源可以告知推理,但绝不能授权副作用。我们提出AIRGuard,一种运行时守卫,将最小权限原则实现为动作时授权。AIRGuard规范化异构工具调用,将任务权限推导为步骤级权限,跟踪源和目标信任度,模拟敏感副作用,审计跨步骤风险,并在动作执行前强制执行决策。在AgentTrap上,AIRGuard将Sonnet 4.6的攻击成功率从无防御时的36.3%降低到5.5%。在DTAP-150上,AIRGuard在Haiku 4.5上保持了76.0%的良性效用,而ARGUS为52.0%,MELON为42.0%。消融实验进一步表明,仅靠提示策略效果有限,而专用的运行时权限控制层为智能体系统提供了对工具介导副作用的直接控制。代码和数据可在https://github.com/Sophie508/AIRGuard获取。

英文摘要

Tool-using language agents turn model decisions into external side effects: they read files, run scripts, call APIs, send messages, and invoke Model Context Protocol tools. This makes agent attacks different from jailbreaks. The harmful step is often not an obviously forbidden output, but an ordinary executable action that becomes unsafe because attacker-controlled context steers authorized access against the user's interest. We identify this failure mode as authority confusion: untrusted resources may inform reasoning, but they must not authorize side effects. We present AIRGuard, a runtime guard that operationalizes least privilege as action-time authorization. AIRGuard normalizes heterogeneous tool calls, derives task authority into step-level authority, tracks source and target trust, simulates sensitive side effects, audits cross-step risk, and enforces decisions before actions execute. On AgentTrap, AIRGuard reduces Sonnet 4.6 attack success from 36.3% without defense to 5.5%. On DTAP-150, AIRGuard preserves 76.0% benign utility with Haiku 4.5, compared with 52.0% for ARGUS and 42.0% for MELON. An ablation further shows that prompt-only policy helps only modestly, whereas a dedicated runtime authority-control layer gives the agent system direct control over tool-mediated side effects. Code and data are available at https://github.com/Sophie508/AIRGuard.

2605.28899 2026-05-29 cs.CR cs.AI

Quantum-Enhanced Adversarial Robustness in Artificial Intelligence

人工智能中的量子增强对抗鲁棒性

Jaydip Sen

AI总结 本文综述了对抗性机器学习与量子计算交叉领域,提出利用量子优化、特征映射和混合量子-经典架构来增强人工智能系统的对抗鲁棒性。

Comments This is the pre-print of the chapter which has been accepted for publication in the edited volume titled "Quantum Enhancements to the AI Industry", edited by Eduard Babulak. The volume will be published by IGI Global, USA. This is not the final version of the chapter published in the book

详情
AI中文摘要

人工智能在多个应用领域取得了显著成功。然而,其对对抗性攻击的脆弱性给可靠性、安全性和可信赖性带来了重大挑战。对抗性机器学习表明,即使是高精度的模型也可能通过精心设计的扰动被操纵,这在医疗、金融和自主技术等安全关键系统中引发了严重担忧。与此同时,量子计算作为一种变革性范式出现,能够通过叠加、纠缠和量子干涉等原理解决复杂的计算问题。这两个领域的融合催生了量子人工智能的出现,该领域探索量子技术如何增强学习效率、可扩展性和鲁棒性。本章全面概述了对抗性机器学习和现有防御策略,随后对量子计算和量子机器学习模型进行了易于理解的介绍。进一步提出了量子增强对抗鲁棒性的概念框架,强调了量子优化、特征映射和混合量子-经典架构。还讨论了实际应用、关键挑战和未来研究方向,以支持安全可信赖的AI系统的开发。

英文摘要

Artificial Intelligence has achieved remarkable success across diverse application domains. However, its vulnerability to adversarial attacks poses significant challenges to reliability, security, and trustworthiness. Adversarial machine learning demonstrates that even highly accurate models can be manipulated through carefully crafted perturbations, raising serious concerns in safety critical systems such as healthcare, finance, and autonomous technologies. In parallel, quantum computing has emerged as a transformative paradigm capable of addressing complex computational problems through principles such as superposition, entanglement, and quantum interference. The convergence of these fields has led to the emergence of quantum artificial intelligence, which explores how quantum techniques can enhance learning efficiency, scalability, and robustness. This chapter provides a comprehensive overview of adversarial machine learning and existing defense strategies, followed by an accessible introduction to quantum computing and quantum machine learning models. It further presents conceptual frameworks for quantum-enhanced adversarial robustness, emphasizing quantum optimization, feature mapping, and hybrid quantum classical architectures. Practical applications, key challenges, and future research directions are also discussed to support the development of secure and trustworthy AI systems.

2605.28890 2026-05-29 cs.CR cs.LG

Echoes within the Reasoning: Stealthy and Effective Watermarking via Chain of Thought

推理中的回声:通过思维链实现隐蔽且有效的数字水印

Jiacheng Lu, Yiming Li, Tao Song, Weijian Wang, Wenjie Qu, Haibing Guan, Jiaheng Zhang

AI总结 提出BiCoT框架,通过将水印嵌入推理轨迹的内部几何结构,并利用基于Top-logprob的黑盒验证器RSR,在不影响推理保真度的前提下实现鲁棒的水印检测。

Comments This paper is accepted by ICML2026

详情
AI中文摘要

具有思维链推理能力的大型语言模型代表有价值的知识产权,然而现有的黑盒水印方法通常通过扰动最终答案或依赖脆弱的触发模式,在鲁棒性和推理保真度之间进行权衡。我们提出BiCoT,一种水印框架,通过将高显著性结构锚点与私有签名子空间对齐,同时正则化普通控制令牌以保留语义容量,将所有权信号嵌入推理轨迹的内部几何结构。这种设计将水印与推理相关表示耦合,使得在不破坏支持连贯推理的特征的情况下难以移除水印。为了在模型窃取和表示漂移下实现验证,我们引入鲁棒子空间注册(RSR),一种基于Top-logprob的黑盒验证器,使用哨兵令牌校准输出分布中的系统性偏移。实验表明,BiCoT在多种复杂推理任务中保持推理保真度,同时在微调、量化、模型级扰动以及自适应输出级攻击下,在域内和域外设置中实现鲁棒检测。

英文摘要

Large Language Models with Chain-of-Thought reasoning capabilities represent valuable intellectual property, yet existing black-box watermarking methods often trade robustness for reasoning fidelity by perturbing final answers or relying on fragile trigger patterns. We propose BiCoT, a watermarking framework that embeds ownership signals into the internal geometry of reasoning traces by aligning high-saliency structural anchors with a private signature subspace while regularizing ordinary control tokens to preserve semantic capacity. This design couples the watermark with reasoning-relevant representations, making removal difficult without disrupting the features that support coherent reasoning. To enable verification under model theft and representation drift, we introduce Robust Subspace Registration (RSR), a Top- logprob-based black-box verifier that uses sentinel tokens to calibrate systematic shifts in the output distribution. Experiments show that BiCoT preserves reasoning fidelity across diverse complex reasoning tasks while achieving robust detection under fine-tuning, quantization, model-level perturbations, and adaptive output-level attacks across in-domain and out-of-distribution settings.

2605.28888 2026-05-29 cs.IR cs.LG

Generative Spatiotemporal Intent Sequence Recommendation via Implicit Reasoning in Amap

高德地图中基于隐式推理的生成式时空意图序列推荐

Sicong Wang, Ruiting Dong, Yue Liu, Bowen Zheng, Jun Meng, Jie Li, Shuaijun Guo, Yu Gu, Fanyi Di, Xin Li

AI总结 提出GPlan框架,通过渐进式隐式思维链蒸馏和时空反事实DPO,将LLM推理能力压缩至轻量模型,实现低延迟且符合时空约束的意图序列生成。

Comments 9 pages, 1 figure

详情
AI中文摘要

现实世界中的用户行为很少由孤立动作组成;相反,它通常形成由时空依赖关系支配的意图流。为了提供集成服务推荐,我们聚焦于生成式时空意图序列推荐(GSISR)任务,旨在生成在复杂时空上下文中逻辑连贯且物理可执行的意图序列。虽然LLMs为GSISR提供了强大的推理潜力,但直接工业部署受到高推理延迟以及上下文不匹配或物理不可行计划的限制。为应对这些挑战,我们提出生成式框架GPlan,通过两个组件将LLM推理内化到轻量模型中。首先,为了在严格的延迟约束下实现推理,我们引入渐进式隐式思维链蒸馏,将显式推理过程压缩到保留的潜在令牌中,使小模型能够继承复杂的规划逻辑而无需生成长推理文本。其次,为了解决通用知识与现实世界约束之间的脱节,我们设计了时空反事实DPO。通过将模型与反事实上下文-计划对对齐,我们提高了对时空上下文的敏感性并减少了上下文不匹配的计划。离线实验和在线A/B测试表明,我们的方法提高了序列连贯性和上下文响应性。我们的实现和匿名化的GSISR数据集可在https://github.com/alibaba/GPlan获取。

英文摘要

Real-world user behavior rarely consists of isolated actions; instead, it often forms intent flows governed by spatiotemporal dependencies. To provide integrated service recommendations, we focus on the task of Generative Spatiotemporal Intent Sequence Recommendation (GSISR), which aims to generate intent sequences that are logically coherent and physically executable within complex spatiotemporal contexts. While LLMs offer strong reasoning potential for GSISR, direct industrial deployment is limited by high inference latency and context-mismatched or physically infeasible plans. To address these challenges, we propose a generative framework, GPlan, that internalizes LLM reasoning into lightweight models through two components. First, to enable reasoning under strict latency constraints, we introduce Progressive Implicit CoT Distillation, which compresses explicit reasoning processes into reserved latent tokens, allowing small models to inherit complex planning logic without generating long reasoning text. Second, to address the disconnect between general knowledge and real-world constraints, we design Spatiotemporal Counterfactual DPO. By aligning the model with counterfactual context-plan pairs, we improve sensitivity to spatiotemporal context and reduce context-mismatched plans. Offline experiments and online A/B testing demonstrate that our approach improves sequence coherence and context responsiveness. Our implementation and the anonymized GSISR dataset are available at https://github.com/alibaba/GPlan.

2605.28886 2026-05-29 q-bio.QM cs.LG

Computational Modeling of Antibody-Antigen Complexes: PLM-Based and MSA-Based Approaches

抗体-抗原复合物的计算建模:基于PLM和基于MSA的方法

Xiao Luo

AI总结 本研究探讨抗体相关任务计算困难的原因,并提出基于蛋白质语言模型(PLM)和多重序列比对(MSA)的两种互补改进方法,以提升抗体-抗原结构预测精度。

Comments PhD thesis

详情
AI中文摘要

抗体通过特异性识别和中和抗原在免疫反应中发挥核心作用,治疗性抗体已成为癌症和自身免疫疾病的主要药物。然而,其发现仍依赖大量体外筛选,而抗体结构和抗体-抗原相互作用的准确计算建模可以优先候选、减少实验负担并加速理性设计。尽管近年来高精度蛋白质和复合物预测取得了进展,但与一般蛋白质-蛋白质相互作用相比,抗体相关任务仍存在持续的性能差距,限制了下游设计。 本论文研究了为何抗体相关任务更困难,并沿两个互补方向提出改进。首先,我们研究了基于蛋白质语言模型(PLM)的抗体及抗体-抗原结构预测方法。利用多个PLM的嵌入,我们的方法在抗体单体预测中达到了所比较的PLM方法中最高的CDR-H3精度。将其扩展到复合物预测时未能泛化:由于缺乏抗体和抗原之间的共进化信号,单序列PLM表示无法可靠识别结合界面。 其次,我们针对抗体-抗原复合物预测开发了两种基于MSA的干预措施:MSA精炼,结合了CDR聚焦过滤和从更大序列数据库恢复深度;以及收敛感知循环,选择稳定的中间循环状态用于最终扩散采样。这些干预措施在保留的抗体-抗原测试集上相对于AlphaFold3基线提供了一致的增益。由于这些方法修改了MSA构建和循环行为而非模型参数,它们无需重新训练或权重访问即可应用。

英文摘要

Antibodies play a central role in the immune response by specifically recognizing and neutralizing antigens, and therapeutic antibodies have become major drugs for cancer and autoimmune diseases. However, their discovery still relies on extensive in vitro screening, and accurate computational modeling of antibody structures and antibody-antigen interactions can prioritize candidates, reduce experimental burden, and accelerate rational design. Despite recent advances in high-accuracy protein and complex prediction, a persistent performance gap remains for antibody-related tasks compared with general protein-protein interactions, limiting downstream design. This thesis investigates why antibody-related tasks are harder and proposes improvements along two complementary directions. First, we investigate protein language model (PLM)-based methods for antibody and antibody-antigen structure prediction. Using embeddings from multiple PLMs, our approach achieves the best CDR-H3 accuracy among compared PLM-based methods on antibody monomer prediction. Extending it to complex prediction does not generalize: without co-evolutionary signals between antibody and antigen, single-sequence PLM representations do not reliably identify binding interfaces. Second, we develop two MSA-based interventions for antibody-antigen complex prediction: MSA refinement, which combines CDR-focused filtering with depth recovery from a larger sequence database, and convergence-aware recycling, which selects a stable intermediate recycle state for final diffusion sampling. Together, these interventions provide consistent gains over the AlphaFold3 baseline on a held-out antibody-antigen test set. Because the methods modify MSA construction and recycling behavior rather than model parameters, they apply without retraining or weight access.

2605.28876 2026-05-29 cs.SE cs.AI

LogDx-CI: Benchmarking Log Reduction Tools for LLM Root-Cause Diagnosis

LogDx-CI:为LLM根因诊断基准测试日志缩减工具

Bowen Qin

AI总结 提出LogDx-CI基准,比较11种日志缩减工具在35个真实CI故障案例上的效果,发现混合grep+tail路由器在成本质量上占优,且智能体循环可缩小质量差距但成本差异持续存在,同时跨家族LLM摘要器优于同家族。

详情
AI中文摘要

CI失败日志规模大(本语料中位数5000行,最大20万行)且噪声多。尝试调试的编码智能体依赖上游工具将日志缩减为可管理的上下文,但该领域缺乏公开的经验比较来评估哪些缩减能为下游LLM诊断保留足够证据。我们引入LogDx-CI基准,比较11种上下文缩减工具(原始、尾部、grep、三种RTK模式、两种真实LLM map-reduce摘要器、三种混合路由器)在35个真实GitHub Actions失败案例上的表现,由3个LLM调试器家族(Claude Haiku 4.5、Claude Sonnet 4.6、OpenAI gpt-5-mini)以及一个Sonnet 4.6工具使用智能体评分。我们报告三个重要发现。(1)混合grep+tail路由器主导成本-质量帕累托前沿;前两种方法得分0.670/0.666,每案例约0.03美元,质量与独立grep相当但令牌数减少4.5倍。(2)在智能体循环场景中,不同缩减工具的质量范围缩小7倍(单次得分跨度0.42 → 智能体循环跨度0.059);智能体通过后续工具调用挽救弱上下文。然而,成本差异持续存在:弱上下文迫使智能体发出2-4倍的工具调用来恢复。(3)跨家族LLM摘要-调试器对(gpt-5-mini摘要器供给Claude Haiku调试器)在四个诊断变体上的平均得分比同家族对高0.071,否定了该任务上的自我调用偏差假设。gpt-5-mini摘要器也是智能体循环中的第一名方法(得分0.749),每案例0.37次工具调用,且缩减器成本比Haiku摘要器低10倍(每案例0.18美元 vs 1.75美元)。所有数据、代码、每个案例的捆绑包和可复现性基础设施均已公开。

英文摘要

CI failure logs are large (median 5k lines, max 200k in this corpus) and noisy. Coding agents that try to debug them depend on an upstream tool to reduce the log to a manageable context, but the field has had no public empirical comparison of which reductions preserve enough evidence for downstream LLM diagnosis. We introduce LogDx-CI, a benchmark that compares 11 context-reduction tools (raw, tail, grep, three RTK modes, two real LLM map-reduce summarizers, three hybrid routers) on 35 real GitHub Actions failure cases, scored by 3 LLM debugger families (Claude Haiku 4.5, Claude Sonnet 4.6, OpenAI gpt-5-mini) plus a Sonnet 4.6 tool-using agent. We report three load-bearing findings. (1)~Hybrid grep+tail routers dominate the cost-quality Pareto frontier; the top two methods score 0.670 / 0.666 at $\sim$ \$0.03 per case, same-ballpark quality as standalone grep at $4.5\times$ fewer tokens. (2)~In the agent-loop regime, the quality range across reduction tools collapses $7\times$ (single-shot spread 0.42 $\to$ agent-loop spread 0.059); the agent rescues weak contexts via follow-up tool calls. However, cost differences persist: weak contexts force the agent to issue 2--4$\times$ more tool calls to recover. (3)~A cross-family LLM-summary pair (gpt-5-mini summarizer feeding a Claude Haiku debugger) beats the same-family pair by $+0.071$ averaged across four diagnoser variants, falsifying the self-call-bias hypothesis on this task. The gpt-5-mini summarizer is also the agent-loop \#1 method (score 0.749) at $0.37$ tool-calls per case and $10\times$ lower reducer cost than the Haiku summarizer (\$0.18 vs \$1.75 per case). All data, code, per-case bundles, and reproducibility infrastructure are public.

2605.28861 2026-05-29 cond-mat.str-el cond-mat.dis-nn cs.LG

Comment on "Spin-1/2 Kagome Heisenberg Antiferromagnet: Machine Learning Discovery of the Spinon Pair-Density-Wave Ground State"

评论:自旋-1/2 Kagome海森堡反铁磁体:通过机器学习发现自旋子对密度波基态

Helia Kamal, Dominik Kufel, DinhDuy Vu, Chris R. Laumann, Norman Y. Yao

AI总结 指出使用群等变卷积神经网络研究kagome海森堡反铁磁体基态时,由于Metropolis-Hastings采样中单自旋翻转更新导致遍历性破缺,使得报告的低能态是伪影,而采用自旋交换更新后网络收敛能量高于DMRG结果,质疑原文结论。

Comments 3 pages, 1 figure; Comment on arXiv:2401.02866

详情
AI中文摘要

最近的一篇文章[Phys. Rev. X 15, 011047 (2025)]利用群等变卷积神经网络研究了kagome海森堡反铁磁体的基态。在迄今为止研究的最大的有限尺寸团簇($N=108$)上,作者报告了显著低于其他数值方法(包括最先进的密度矩阵重正化群(DMRG)计算)的变分能量。与先前暗示可能存在自旋液体基态的结果相反,作者观察到了自旋子对密度波基态。我们发现:(i)报告的低能量是Metropolis-Hastings采样中遍历性破缺的伪影,因为作者使用的单自旋翻转更新规则实际上冻结了马尔可夫链;(ii)当通过自旋交换更新强制执行遍历采样时,神经网络收敛到显著高于现有DMRG结果的能量,这使该论文的主张受到质疑。

英文摘要

A recent article [Phys. Rev. X 15, 011047 (2025)] utilizes group-equivariant convolutional neural networks to study the ground state of the kagome Heisenberg antiferromagnet. On the largest finite-size cluster studied to date ($N=108$), the authors report variational energies significantly lower than other numerical methods, including state-of-the-art density matrix renormalization group (DMRG) calculations. In contrast to previous results suggesting a possible spin-liquid ground state, the authors observe a spinon pair-density-wave ground state. We find that: (i) the reported low energies are artifacts of broken ergodicity in the Metropolis--Hastings sampling, since the single-spin-flip update rule utilized by the authors effectively freezes the Markov chains; and (ii) when ergodic sampling is enforced via spin-exchange updates, the neural network converges to energies significantly higher than existing DMRG results, calling the paper's claims into question.

2605.28858 2026-05-29 cs.CE cs.LG math-ph math.MP

An End-to-End PyTorch Interface for Differentiable PDE Solvers: A RANS Model-Correction Study

可微PDE求解器的端到端PyTorch接口:一项RANS模型校正研究

Luca Saverio, Michele Alessandro Bucci, Gianmarco Farro, Cédric Content, Denis Sipp

AI总结 提出一个端到端可微机器学习框架,通过将PDE作为隐层集成到PyTorch中,优化参数化校正项,用于数据同化和闭合建模,并在可压缩流RANS方程上验证。

详情
AI中文摘要

本工作提出了一种在完全可微的机器学习框架内求解偏微分方程约束反问题的端到端策略。所提出的公式提供了一种统一且用户友好的方法,适用于从数据同化到闭合建模的广泛问题。我们的方法结合了一个基线可微PDE求解器(从非线性系统$R(w) = 0$预测状态$w$)和一个通用的加性、参数化、可微校正$f_ϕ(w)$,其可训练参数为$ϕ$。我们展示了如何通过将PDE重新表述为隐层,将其集成到任意目标函数中,同时利用PyTorch的自动微分图,在完全可微的Python工作流中优化phi。该方法在可压缩流的雷诺平均纳维-斯托克斯方程上进行了演示,其中闭合项或其一部分使用可训练参数或神经网络建模。第一个应用考虑了二维NASA壁装驼峰测试案例,其中生产项参数针对时间平均LES数据进行了优化。第二个应用在VKI LS-59涡轮叶片上进行,其中通过优化可训练空间场重建了Spalart-Allmaras涡粘性场。使用可微BROADCAST求解器和Spalart-Allmaras湍流模型,从VKI LS-59涡轮叶片几何形状生成数据集。结果突出了该框架的灵活性,展示了其超越湍流建模,适用于更广泛的物理信息PDE约束问题(具有数据驱动组件)的适用性。

英文摘要

This work presents an end-to-end strategy for solving inverse problems constrained by Partial Differential Equations within a fully differentiable Machine Learning framework. The proposed formulation provides a unified and user-friendly methodology applicable to a wide range of problems, from data assimilation to closure modeling. Our approach combines a baseline differentiable PDE solver, which predicts the state w from the nonlinear system $R(w) = 0$, with a generic additive, parametrized, and differentiable correction $f_ϕ(w)$, with trainable parameters $ϕ$. We show how to optimize phi within a fully differentiable Python workflow by reformulating the PDE as an implicit layer, enabling its integration into arbitrary objective functions, while leveraging PyTorch's automatic differentiation graph. The method is demonstrated on the Reynolds-Averaged Navier-Stokes equations for compressible flows, where the closure term, or a portion of it, is modeled using trainable parameters or a Neural Network. The first application considers the 2D NASA Wall-Mounted Hump test case, where a production-term parameter is optimized against time-averaged LES data. A second application is carried out on the VKI LS-59 turbine blade, where the Spalart-Allmaras eddy viscosity field is reconstructed through the optimization of a trainable spatial field. A dataset is generated starting from the VKI LS-59 turbine blade geometry using the differentiable BROADCAST solver with the Spalart-Allmaras turbulence model. The results highlight the flexibility of the framework, showing its applicability beyond turbulence modeling to a broader class of physics-informed PDE-constrained problems with data-driven components.

2605.28853 2026-05-29 q-fin.PM cs.LG

Financially Guided Deep Portfolio Optimization

财务引导的深度投资组合优化

Rahul Fernandes, Travis Desell

AI总结 提出一个端到端框架,通过直接优化夏普比率、Omega比率、条件风险价值(CVaR)和风险平价等关键财务指标的微分代理,利用神经网络学习投资组合权重,在2007-2023年50只标普500股票上,最佳模型(AttentionLSTM结合Omega-CVaR-RiskParity损失)在2022-2023年样本外测试中实现年化夏普比率0.29和总复合收益+7.86%,超越标普500指数12.38个百分点。

详情
AI中文摘要

由于非平稳性、噪声数据和高交易成本,现实金融市场中的投资组合优化极其困难。标准的预测-然后优化方法首先预测收益,然后求解权重,这加剧了预测误差,并且常常在制度转换下失败。我们提出一个端到端框架,直接优化关键财务指标——夏普比率、Omega比率、条件风险价值(CVaR)和风险平价——的可微代理,使得神经网络能够通过反向传播学习投资组合权重。我们的扩展窗口滚动前向程序,应用于2007年至2023年的50只标普500股票,包含了现实的买卖价差成本,并每季度再平衡。在具有挑战性的样本外测试期(2022-2023年),最佳模型——使用Omega-CVaR-RiskParity损失的AttentionLSTM——实现了年化夏普比率0.29和总复合收益+7.86%,而标普500指数总收益为-4.52%,年化夏普比率为-0.02。这比标普500指数高出12.38个百分点(相对改进超过270%),同时保持尾部风险(CVaR)几乎不变。该框架持续优于等权重投资组合、标普500指数以及传统方法(MVP、HRP、NCO),表明将财务目标直接嵌入模型训练能够在不利市场条件下产生稳健、经济上有意义的超额收益。

英文摘要

Portfolio optimization in real-world financial markets is notoriously difficult due to non-stationarity, noisy data, and high transaction costs. Standard predict-then-optimize methods first forecast returns and then solve for weights, compounding prediction errors and often failing under regime shifts. We propose an end-to-end framework that directly optimizes differentiable surrogates of key financial metrics - Sharpe ratio, Omega ratio, Conditional Value-at-Risk (CVaR), and Risk Parity - allowing neural networks to learn portfolio weights via backpropagation. Our expanding-window walk-forward procedure, applied to 50 S&P 500 stocks from 2007 to 2023, incorporates realistic bid-ask spread costs and rebalances quarterly. On the challenging out-of-sample test period (2022-2023), the best model - an AttentionLSTM with the Omega-CVaR-RiskParity loss - achieves an annualized Sharpe of 0.29 and a total compounded return of +7.86%, while the S&P 500 delivers -4.52% total return and an annualized Sharpe of -0.02. This outperforms the S&P 500 by 12.38 percentage points (a relative improvement of over 270%), while keeping tail risk (CVaR) nearly unchanged. The framework consistently outperforms the equal-weight portfolio, S&P 500, and traditional methods (MVP, HRP, NCO), demonstrating that embedding financial objectives directly into model training yields robust, economically meaningful outperformance even in adverse market conditions.

2605.28851 2026-05-29 astro-ph.EP astro-ph.IM cs.LG physics.ao-ph

Towards a Foundation Model for the Martian Atmosphere

火星大气基础模型

Sujit Roy, Udayshankar Nair, Yuling Wu, Georgios Priftis, Liping Wang, Anastasia Georgiou, Anne Jones, Björn Lütjens, Johannes Schmude, Campbell Watson, Rachel A. Slank, Ankur Kumar, Anirbit Mukherjee, Procheta Sen, Ramin Lolachi, Haonan Chen, Manil Maskey, Juan Bernabé-Moreno, Rahul Ramachandran

AI总结 针对火星大气数据稀疏、计算成本高等挑战,本文探讨了构建数据驱动基础模型的设计空间,包括可用数据、物理模型、下游应用及AI方法。

详情
AI中文摘要

火星大气中存在从行星尺度沙尘暴到中尺度地形云和夜间低空急流等动力学现象。全球环流模型能够模拟这些现象,但在解析中尺度特征所需的分辨率下计算成本高昂。虽然卫星遥感观测的同化使得利用此类模型进行预报成为可能,但观测记录通常稀疏、短暂且分散在不同仪器代际之间。这些限制促使我们开发数据驱动的火星大气基础模型。 基础模型处于复杂的设计空间中。可用数据、底层过程的物理特性以及人工智能的相应发展之间存在相互作用。尽管基础模型旨在以数据和计算高效的方式处理多个用例,但明确单个模型能够合理解决哪些应用至关重要。 本文旨在阐明这一设计空间。我们讨论了从大气反演到再分析数据集以及现有物理模型的可用数据。此外,我们识别了广泛的候选下游应用。最后,我们考虑了在此背景下可以利用的人工智能(AI)相关最新进展。这里,我们特别关注用于大气物理的AI模型、数据驱动的数据同化方法以及在有限数据环境下工作的技术。

英文摘要

The martian atmosphere hosts dynamical phenomena ranging from planet-encircling dust storms to mesoscale orographic clouds and nocturnal low-level jets. General circulation model show capability to simulate these phenomena, but is computationally expensive at resolution needed to resolve mesoscale features. While assimilation of satellite remote sensing observation enable forecasting capabilities using such models, observation record is often sparse, short and fragmented across instrument generators. These constraints motivate the development of a data-driven foundation model for the Martian atmosphere. Foundation models live in a complex design landscape. There is an interplay between the available data, the physics of the underlying processes and corresponding developments in AI. Even though the idea of a foundation model is to address multiple use cases in a data- and compute-efficient manner, it is important to have a clear picture what applications can sensibly addressed by a single model. The purpose of this paper is to elucidate this design landscape. We discuss available data ranging from atmospheric retrievals to reanalysis datasets as well as existing physical models. Moreover, we identify a wide range of candidate downstream applications. Finally, we consider relevant recent developments in artificial intelligence (AI) that can be leveraged in this context. Here, we put a particular emphasis on AI models for atmospheric physics, data-driven approaches to data assimilation as well as methods to work in a limited data setting.

2605.28844 2026-05-29 cs.NE cs.LG

WASHH: An Anchor-Aware Whale-Guided Selection Hyper-Heuristic for Continuous Optimization and SVC Configuration

WASHH:一种用于连续优化和SVC配置的锚点感知鲸鱼引导选择超启发式算法

Yifu Zhao, Xiaofan Zou, Junhao Wei, Yanxiao Li, Baili Lu, Zhenhong Peng, Dexing Yao, Haochen Li, Qinbin He, Sio-Kei Im, Xu Yang, Yapeng Wang

AI总结 提出WASHH超启发式算法,通过在线奖励控制器选择多种搜索行为,在连续优化和SVC超参数配置中取得最优平均排名和最低验证损失。

详情
AI中文摘要

学习辅助的算法设计通常必须在小的评估预算下做出可靠的搜索决策,而仅依赖单一元启发式算法可能不可靠。我们提出了WASHH,一种用于连续黑箱优化的鲸鱼引导自适应选择超启发式算法。WASHH使用WOA作为主要开发骨干,但将PSO风格记忆、GWO风格领导者平均、DE风格变异、局部坐标搜索和锚点引导细化视为可选择的搜索行为。在线奖励控制器根据观察到的改进分配评估,而锚点细化利用廉价参考配置(如箱中心或默认模型设置),而不绕过黑箱评估。在10个30维基准函数上,进行10次独立运行和12,000次评估,WASHH实现了最佳平均排名1.10,并在所有10个函数上达到最佳或并列最佳。它在8个函数上严格优于WOA,并在Rastrigin和Griewank函数上与WOA在数值最优值上持平。我们进一步研究了在300次评估预算下乳腺癌诊断的SVC超参数配置。WASHH在比较的优化器中获得了最低的平均验证对数损失,表明锚点感知选择超启发式算法是LEAD系统的一种实用轻量级方向。

英文摘要

Learning-assisted algorithm design often has to make reliable search decisions under small evaluation budgets, where committing to a single metaheuristic can be unreliable. We propose WASHH, a Whale-guided Adaptive Selection Hyper-Heuristic for continuous black-box optimization. WASHH uses WOA as the main exploitation backbone, but treats PSO-style memory, GWO-style leader averaging, DE-style variation, local coordinate search, and anchor-guided refinement as selectable search behaviors. An online reward controller allocates evaluations according to observed improvements, while anchor refinement exploits inexpensive reference configurations such as box centers or default model settings without bypassing black-box evaluation. On ten 30-dimensional benchmark functions with 10 independent runs and 12,000 evaluations, WASHH achieves the best average rank, 1.10, and is best or tied best on all ten functions. It strictly improves over WOA on eight functions and ties WOA at the numerical optimum on Rastrigin and Griewank. We further study SVC hyperparameter configuration for breast cancer diagnosis under a 300-evaluation budget. WASHH obtains the lowest mean validation log loss among the compared optimizers, suggesting that anchor-aware selection hyper-heuristics are a practical lightweight direction for LEAD systems.