arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.29353 2026-05-29 cs.CR cs.CV

DeepFake Forensics AI: A Multi-Modal Detection and Blockchain-Anchored Evidence Management Platform

DeepFake Forensics AI：多模态检测与区块链锚定证据管理平台

Naisha Minnah

AI总结提出一个统一平台，通过训练四种神经网络检测图像、视频和音频中的合成媒体，并利用以太坊区块链实现证据的不可篡改存储与管理。

Comments 5 pages, 5 figures, 3 tables

详情

AI中文摘要

AI生成的合成媒体的激增对法律和取证背景下数字证据的完整性构成了严重威胁。现有的深度伪造检测系统通常处理单一模态，并且没有提供防篡改证据保存的机制。我们提出了DeepFake Forensics AI，这是一个统一平台，能够检测图像、视频和音频模态中的合成媒体，识别生成架构指纹，并将取证证据不可变地锚定在以太坊区块链上。我们的系统从头训练了四个独立的神经网络：一个EfficientNet-B4图像检测器（AUC = 0.9868）、一个双向LSTM视频检测器（AUC = 0.9628）、一个ECAPA-TDNN音频检测器（EER = 18.63%），以及一个新颖的GAN指纹模块（准确率 = 99.88%），用于识别伪造图像背后的生成架构。证据文件通过SHA-256哈希，通过Pinata存储在IPFS上，并通过基于角色的访问控制的Solidity智能合约在链上注册。该平台提供了React前端和FastAPI后端，适用于取证和法律工作流程的部署。据我们所知，这是第一个将多模态深度伪造检测与基于区块链的链上保管管理相统一的系统。

英文摘要

The proliferation of AI-generated synthetic media poses a critical threat to the integrity of digital evidence in legal and forensic contexts. Existing deepfake detection systems typically address a single modality and provide no mechanism for tamper-proof evidence preservation. We present DeepFake Forensics AI, a unified platform that detects synthetic media across image, video, and audio modalities, identifies generative architecture fingerprints, and anchors forensic evidence immutably on the Ethereum blockchain. Our system trains four independent neural networks from scratch: an EfficientNet-B4 image detector (AUC = 0.9868), a Bidirectional LSTM video detector (AUC= 0.9628), an ECAPA-TDNN audio detector (EER = 18.63%), and a novel GAN fingerprinting module (accuracy = 99.88%) that identifies the generative architecture behind a fake image. Evidence files are hashed with SHA-256, stored on IPFS via Pinata, and registered on-chain via a Solidity smart contract with role-based access control. The platform provides a React frontend and FastAPI backend suitable for deployment in forensic and legal workflows. To our knowledge, this is the first system to unify multi-modal deepfake detection with blockchain-based chain-of custody management.

URL PDF HTML ☆

赞 0 踩 0

2605.29329 2026-05-29 q-bio.QM cs.LG

Mixing Vector Model for Copolymer Inference via Mixed Integer Linear Programming

基于混合整数线性规划的共聚物推断的混合向量模型

Jianshen Zhu, Raveena Rai, Taiyo Sohkawa, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu

AI总结提出混合向量模型，通过混合整数线性规划实现共聚物的逆设计，在多个物化数据集上取得高预测精度并保持可解性。

详情

AI中文摘要

最近开发了一种新颖的两阶段分子推断框架mol-infer，通过两层模型下的混合整数线性规划（MILP），在给定学习预测函数和结构约束的条件下，以最优性和精确性推断具有规定抽象结构和期望性质值的化学图。在本研究中，我们通过引入一种称为混合向量（MV）模型的简单特征表示，将该框架扩展到共聚物。在所提出的模型中，共聚物特征向量表示为MILT可处理单体描述符的凸组合，加权系数为组成单体的混合比例。这种表示不需要明确的序列类别信息，因此自然兼容基于MILP的逆设计。在该模型下，我们使用人工神经网络、简化二次多元线性回归和随机森林为多个共聚物性质数据集构建预测函数。所提出的表示在多个物理化学性质数据集上实现了实际有用的预测性能；特别地，十个数据集中有九个的最佳测试R²分数超过0.7，六个数据集超过0.9。我们还制定了在MV表示下具有规定混合比例的多单体逆设计问题，并表明即使在三单体设置下，生成的MILP实例仍然可解。最后，我们通过重新评估推断的候选物并将重新计算的性质值与学习模型预测的值进行比较，进行外部一致性检查。总体而言，所提出的框架为在两层模型下实现共聚物的模型级精确逆设计提供了可处理的第一步。

英文摘要

A novel two-phase molecule inference framework, mol-infer, has recently been developed to infer chemical graphs with prescribed abstract structures and desired property values through mixed integer linear programming (MILP) under the two-layered model, with guaranteed optimality and exactness relative to the given learned prediction function and structural constraints. In this study, we extend this framework to copolymers by introducing a simple feature representation, called the mixing vector (MV) model. In the proposed model, a copolymer feature vector is represented as a convex combination of MILP-tractable monomer descriptors weighted by the mixing ratio of the constituent monomers. This representation does not require explicit sequence-class information and is therefore naturally compatible with MILP-based inverse design. Under this model, we construct prediction functions for several copolymer property datasets using artificial neural networks, reduced quadratic multiple linear regression, and random forests. The proposed representation achieves practically useful predictive performance across multiple physicochemical property datasets; in particular, the best test R^2 score exceeds 0.7 for nine of the ten datasets and exceeds 0.9 for six datasets. We also formulate a multi-monomer inverse-design problem under the MV representation with a prescribed mixing ratio and show that the resulting MILP instances remain tractable, even for three-monomer settings. Finally, we perform an external consistency check by re-evaluating the inferred candidates and comparing the re-computed property values with those predicted by the learned model. Overall, the proposed framework gives a tractable first step toward model-level exact inverse design of copolymers under the two-layered model.

URL PDF HTML ☆

赞 0 踩 0

2605.29318 2026-05-29 cs.GR cs.CV

FreeForm: Reduced-Order Deformable Simulation from Particle-Based Skinning Eigenmodes

FreeForm: 基于粒子蒙皮特征模态的降阶可变形仿真

Donglai Xiang, Vismay Modi, Rishit Dagli, Ty Trusty, Gilles Daviet, Anka He Chen, Nicholas Sharp, David I. W. Levin

AI总结提出一种基于再生核粒子法的无网格降阶超弹性物体仿真方法，通过求解弹性能量Hessian矩阵的广义特征系统构建降阶蒙皮权重，实现40倍训练加速并降低仿真误差。

Comments CVPR 2026, project website: https://research.nvidia.com/labs/sil/projects/freeform/

详情

AI中文摘要

我们提出了一种新的无网格、降阶可变形超弹性物体仿真方法。现有的降阶弹性动力学仿真工作要么通过网格表示输入几何体（由于扫描和三角化复杂形状的挑战，网格难以获得），要么通过需要逐形状优化的神经场表示。我们提出采用再生核粒子法（RKPM）表示，通过求解弹性能量Hessian矩阵上的广义特征系统，构建降阶蒙皮权重。我们证明，与神经场的逐形状优化相比，该公式不仅实现了40倍的训练加速，而且在与有限元方法的收敛结果进行评估时，实现了更低的仿真误差。我们在不同表示（包括网格和高斯溅射）的各种物体上展示了仿真结果，以及我们的方法在机器人仿真下游任务中的应用。

英文摘要

We present a novel formulation for mesh-free, reduced-order simulation of deformable hyperelastic objects. Existing work in reduced-order elastodynamic simulation represents the input geometry by either meshes, which can be difficult to obtain due to challenges in scanning and triangulating complex shapes, or by neural fields that require per-shape optimization. We propose to adopt a Reproducing Kernel Particle Method (RKPM) representation, which enables the construction of reduced-order skinning weights by solving a generalized eigensystem on the Hessian matrix of the elastic energy. We demonstrate that this formulation not only leads to a 40x training speedup compared with the per-shape optimization of neural fields, but also achieves lower simulation error when evaluated against the converged results of finite element method. We show our simulation results on a wide variety of objects in different representations including meshes and Gaussian splats, as well as the application of our method in the downstream task of robot simulation.

URL PDF HTML ☆

赞 0 踩 0

2605.29277 2026-05-29 cs.SE cs.AI

Code-QA-Bench: Separating Code Reasoning from Documentation Memorization in Repository-Level QA

Code-QA-Bench：在仓库级问答中分离代码推理与文档记忆

Jun Zhang, JianYing Qu, Hanwen Du, Zhongkai Sun, Yehua Yang, Qiao Zhao

AI总结提出Code-QA-Bench框架，通过答案优先生成和三条件实验设计，自动构建仓库级代码理解基准，以区分代码推理、文档回忆和预训练记忆的影响。

详情

AI中文摘要

我们提出了Code-QA-Bench，一个全自动框架，用于合成仓库级代码理解基准，将真正的代码理解与文档回忆和预训练记忆分离。该框架有两个方法论贡献：（1）答案优先生成流程，其中配备工具的代理探索源代码以生成经过验证的金色答案，然后推导问题，确保每个任务都基于真实的代码结构；（2）三条件实验设计，在闭卷（无仓库）、仅代码（移除文档）和带文档（完整仓库）条件下评估代理，差值直接量化文档效用和记忆。我们从SWE-Bench中的10个Python仓库生成了528个代码可推导任务和100个文档依赖任务，由LLM评判员根据准确性、完整性和特异性评分。对四个前沿模型的实验表明，代码访问是主导因素（比闭卷平均提高0.23），文档提供了适度的额外收益（文档依赖任务上提高0.071），并且在代码可推导任务上仅代码≈带文档，验证了该设计。该框架是开源的，适用于任何文档良好的Python仓库。

英文摘要

We present Code-QA-Bench, a fully automated framework for synthesizing repository-level code understanding benchmarks that separates genuine code comprehension from documentation recall and pretraining memorization. The framework makes two methodological contributions: (1) an answer-first generation pipeline where a tool-equipped agent explores source code to produce verified gold answers before deriving questions, ensuring every task is grounded in real code structure; and (2) a three-condition experimental design evaluating agents under closed-book (no repository), code-only (documentation removed), and documented (full repository) conditions, with deltas directly quantifying documentation utility and memorization. We generate 528 code-derivable and 100 doc-dependent tasks across 10 Python repositories from SWE-Bench, scored by an LLM judge on accuracy, completeness, and specificity. Experiments on four frontier models reveal that code access is the dominant factor (+0.23 mean gain over closed-book), documentation provides modest additional benefit (+0.071 on doc-dependent tasks), and code-only $\approx$ documented on code-derivable tasks, validating the design. The framework is open-source and applicable to any well-documented Python repository.

URL PDF HTML ☆

赞 0 踩 0

2605.29249 2026-05-29 stat.ML cs.LG

Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research

跨任务预测驱动推理在AI评估与社会科学研究中的应用

Nicolas Emmenegger, Ellery Stahler, Chara Podimata

AI总结提出多任务预测驱动推理框架，通过跨任务重校准利用共享结构，在标签稀缺时提升统计推断效率，并证明非线性结构是跨任务增益的必要条件。

详情

AI中文摘要

许多应用需要在多个相关任务中进行统计上有效的推断，而每个假设只使用少量高质量标签。在AI评估中，这些任务可能对应于不同提示、子群体或假设下的模型行为；在社会科学调查中，它们可能对应于相关问题、群体或测量条件。预测驱动推理（PPI）利用丰富但廉价的代理测量来改进有限真实标签的推断，但常用方法独立处理任务，因此未能利用相关任务间的共享结构。这一限制在每任务仅有少量标签的场景中尤为重要。为解决此问题，我们引入了一个多任务预测驱动推理框架，该框架利用来自相关任务的标记数据来提高统计功效，同时保留任务特定的推断。我们的方法通过跨任务重校准来利用代理-真实关系中的共享结构，同时保留任务内修正和功效调优，以构建精确的点估计和置信区间。我们证明，只有当代理-真实关系包含非线性结构时，才能实现超越功效调优PPI的效率提升；仿射跨任务重校准在渐近意义上等同于使用原始代理。我们通过合成和半合成数据集上的实验，以及2024年美国总统大选期间审计语言模型关于选举相关信息的案例研究，补充了我们的理论发现。利用一项大型人工标注研究，我们表明当标签稀缺时，跨任务重校准可以显著减少置信区间宽度。

英文摘要

Many applications require statistically valid inference across many related tasks, while using only a handful of high-quality labels per hypothesis. In AI evaluation, these tasks may correspond to model behaviors across prompts, subgroups, or hypotheses; in social science surveys, they may correspond to related questions, populations, or measurement conditions. Prediction-powered inference (PPI) uses abundant but inexpensive proxy measurements to improve inference from limited, ground-truth labels, but commonly used methods treat tasks independently and therefore fail to exploit shared structure across related tasks. This limitation is especially important in settings where only a small number of labels are available per task. To address this issue, we introduce a multi-task prediction-powered inference framework that uses labeled data from related tasks to improve power while preserving task-specific inference. Our methods exploit the shared structure in the proxy-ground-truth relationship through cross-task recalibration, while retaining within-task rectification and power tuning to construct accurate point estimates and confidence intervals. We prove that efficiency gains beyond power-tuned PPI are only possible when the proxy-ground-truth relationship contains nonlinear structure; affine cross-task recalibrations are asymptotically equivalent to using the original proxy. We complement our theoretical findings with experiments on synthetic and semi-synthetic datasets, as well as a case study auditing language models on election-related information during the 2024 U.S. presidential election. Using a large human-annotation study, we show that cross-task recalibration can substantially reduce confidence interval widths when labels are scarce.

URL PDF HTML ☆

赞 0 踩 0

2605.29245 2026-05-29 cs.CR cs.CL cs.LG

Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content

LLM的隐式身份技术：跨数据集、模型和生成内容的指纹识别与水印

Bing Liu, Shunping Wang, Yufan Zhu, Xinyi Yu, Jing Huang, Linkang Du, Hongbin Pei, Wei Luo

AI总结本文综述了LLM指纹识别和水印技术，提出隐式身份统一抽象，并基于生命周期分类法组织数据集、模型和生成内容的技术，建立评估框架。

Comments Accepted by IJCAI-ECAI 2026. 11 pages, 1 figure. Survey and taxonomy of LLM fingerprinting and watermarking for identity, provenance, generated-content attribution, and asset protection

详情

AI中文摘要

本文对LLM指纹识别和水印技术进行了综述和分类，用于身份验证、所有权验证、溯源和生成内容归因。大型语言模型（LLM）需要大量数据、计算和专业知识投入，并越来越多地部署在高风险场景中，因此保护LLM相关资产并追溯其来源至关重要。现有工作已在数据集溯源、模型所有权和生成内容检测方面迅速扩展，但该领域仍然碎片化：指纹识别和水印的使用往往不一致，且方法通常仅在孤立的资产特定设置中研究。为解决这一差距，我们引入隐式身份作为LLM系统中可验证但不可直接观察的身份信号的统一抽象。我们将指纹识别区分为源自内在特征的非侵入式身份，将水印区分为有意嵌入数据、模型或生成内容中的侵入式身份。然后，我们提出一种基于生命周期的分类法，将技术组织到数据集、模型和生成内容中，并进一步通过验证语义进行区分：基于相似性的归因和密钥验证。最后，我们建立一个以可识别性、鲁棒性和可部署性为中心的评估框架，总结在现实访问和变换条件下的代表性指标。通过统一术语、生命周期阶段和评估目标，本综述为研究LLM身份技术以及开发更可靠的资产保护和溯源机制提供了结构化基础。

英文摘要

This paper presents a survey and taxonomy of LLM fingerprinting and watermarking for identity, ownership verification, provenance, and generated-content attribution. Large language models (LLMs) require substantial investments in data, computation, and expertise, and are increasingly deployed in high-stakes settings, making it critical to protect LLM-related assets and trace their origins. Existing work has rapidly expanded across dataset provenance, model ownership, and generated-content detection, but the field remains fragmented: fingerprinting and watermarking are often used inconsistently, and methods are typically studied within isolated asset-specific settings. To address this gap, we introduce implicit identity as a unifying abstraction for verifiable but not directly observable identity signals in LLM systems. We distinguish fingerprinting as non-intrusive identity derived from intrinsic characteristics, and watermarking as intrusive identity deliberately embedded into data, models, or generated content. We then propose a lifecycle-based taxonomy that organises techniques across datasets, models, and generated content, and further separates them by verification semantics: similarity-based attribution and keyed verification. Finally, we establish an evaluation framework centred on identifiability, robustness, and deployability, summarising representative metrics under realistic access and transformation regimes. By unifying terminology, lifecycle stages, and evaluation objectives, this survey provides a structured foundation for studying LLM identity technologies and for developing more reliable mechanisms for asset protection and provenance.

URL PDF HTML ☆

赞 0 踩 0

2605.29191 2026-05-29 eess.SY cs.RO cs.SY math.OC

Distributed Non-Uniform Scaling Control of Multi-Agent Formation with Dynamic Agent Joining

具有动态加入智能体的多智能体编队分布式非均匀缩放控制

Tao He, Gangshan Jing

AI总结针对动态加入智能体的多智能体编队，提出一种分布式非均匀缩放控制框架，通过保持图拉普拉斯矩阵的谱特性实现任意维度下的编队形状调整。

Comments This paper has been accepted by IFAC 2026

2605.29141 2026-05-29 cs.IR cs.AI

Toward User Preference Alignment in LLM Recommendation via Explicit Context Feedback

通过显式上下文反馈实现LLM推荐中的用户偏好对齐

Weizhi Zhang, Wooseong Yang, Yuxin Cui, Zhaohui Guo, Hins Hu, Liangwei Yang, Henry Peng Zou, Qifei Wang, Hanqing Zeng, Jiayi Liu, Yinglong Xia, Philip S. Yu

AI总结本文主张在基于大语言模型的推荐系统中优先利用显式上下文反馈（如评论文本）来对齐用户偏好，提升推荐的个性化和可解释性。

Comments Published in CogMI 2025. https://ieeexplore.ieee.org/abstract/document/11417068

详情

AI中文摘要

传统推荐系统主要从隐式信号（如点击、观看和购买）推断用户偏好，往往忽略了用户通过评论文本等言语形式提供的丰富显式上下文反馈。这种显式上下文反馈捕捉了用户决策背后关于偏好的细微原因，并为用户偏好对齐和更可解释的推荐提供了关键的异构信息。忽视这些信号可能导致用户偏好错位，并进一步强化信息茧房，因为算法无法理解用户选择背后的“语义上下文”。大语言模型的最新进展为利用用户生成内容实现更准确和多样化的推荐提供了新机遇，但当前基于大语言模型的推荐仍主要关注项目元数据，未能充分利用这一资源。本文主张在下一代基于大语言模型的推荐系统中优先考虑显式上下文反馈。我们回顾了推荐范式的演变，强调了富含上下文的反馈的价值，呼吁建立新的基准和指标，并介绍了将显式用户信号集成到可扩展的基于大语言模型的推荐系统中的框架。以用户偏好建模为中心，我们旨在促进在线推荐平台更加个性化、透明和可解释。

英文摘要

Traditional recommender systems (RecSys) primarily infer user preferences from implicit signals (such as clicks, watches, and purchases), often neglecting the rich explicit contextual feedback users provide through verbal text, like comments and reviews. This explicit context feedback captures the nuanced reasons behind user decisions regarding their preferences. In addition, it offers critical heterogeneous information for user preference alignment and more explainable recommendations. Overlooking such signals can lead to misaligned user preferences and further reinforce filter bubbles, as algorithms fail to understand the "semantic context" behind user choices. Recent advances in Large Language Models (LLMs) present new opportunities to harness user-generated content for more accurate and diverse recommendations, yet current LLM-based recommendations still focus on using item meta-data and underutilize this resource. In this paper, we advocate for prioritizing explicit context feedback in the next generation of LLM-based RecSys. We review the evolution of recommendation paradigms, highlight the value of context-rich feedback, call for new benchmarks and metrics, and introduce frameworks for integrating explicit user signals into scalable LLM-driven RecSys. Centering on user-preference modeling, we aim to foster more personalized, transparent, and explainable RecSys online platforms.

URL PDF HTML ☆

赞 0 踩 0

2605.29139 2026-05-29 stat.ML cs.LG

Anytime-Valid Federated Conformal RAG for LLM Swarms

面向LLM群体的任意有效联邦共形RAG

Prasanjit Dubey, Xiaoming Huo

AI总结提出Anytime-FC-RAG，通过可累积的逐步校准偏差预算和截断投注e过程，将联邦共形RAG扩展到任意停止时间均有效的序贯覆盖，并保证时间均匀报警有效性、Hoeffding拼接累积误覆盖包络及自适应控制下的安全性。

详情

AI中文摘要

联邦共形RAG（FC-RAG）为带宽受限的弱语言模型群体提供了无分布假设的覆盖保证，但仅限于固定时间范围。我们将其扩展到任意有效序贯覆盖：在每个停止时间均有效，且在可预测自适应控制（重新校准、每节点带宽升级、蒸馏学生刷新）下保持不变，且无需比固定时间范围FC-RAG更多的假设。朴素组合失败，因为FC-RAG的边缘覆盖界使得投注e过程在不利校准抽取下成为非超鞅，无法调用Ville不等式。我们提出Anytime-FC-RAG，这是一种序贯扩展，基于可累加的逐步校准偏差预算，将边缘界转换为校准好事件上的严格条件界，并配以在整个概率空间上为非负超鞅的截断投注e过程。由这两个要素，我们获得四个保证：时间均匀报警有效性$\mathbb{P}(\sup_t E_t \ge 1/δ_e) \le δ_e + δ_{\mathrm{cal}}$，相同总预算下的Hoeffding拼接累积误覆盖包络，任何可预测控制器（重新校准、带宽升级、学生刷新）下的安全性，以及通过可累加训练预算在无界序列的联邦探针-逻辑蒸馏（FPLD）刷新上的训练侧误差传播。实际结果是，仅在e过程超过警告阈值时升级检索带宽的自适应控制器，以显著更低的通信成本匹配固定高带宽调度的报警率。在GPT-2-small + MiniLM群体上对MMLU、DBpedia和AG News的实验验证了预测的报警率、检测延迟、包络覆盖以及14%-57%的带宽节省；报警仅在覆盖真正失效时触发。

英文摘要

Federated Conformal RAG (FC-RAG) provides distribution-free coverage for a bandwidth-limited swarm of weak language models, but only at a fixed horizon. We extend it to anytime-valid sequential coverage: validity at every stopping time, preserved under predictable adaptive control (recalibration, per-node bandwidth escalation, distilled-student refresh), at no extra cost in assumptions over fixed-horizon FC-RAG. Naive composition fails because FC-RAG's marginal coverage bound makes the betting e-process a non-supermartingale on adverse calibration draws, and Ville's inequality cannot be invoked. We give Anytime-FC-RAG, a sequential extension built on a summable per-step calibration-deviation budget that converts the marginal bound into a strict conditional bound on a calibration-good event, paired with a truncated betting e-process that is a nonnegative supermartingale on the entire probability space. From these two ingredients, we obtain four guarantees: time-uniform alarm validity $\mathbb{P}(\sup_t E_t \ge 1/δ_e) \le δ_e + δ_{\mathrm{cal}}$, a Hoeffding-stitched cumulative-miscoverage envelope at the same total budget, safety under any predictable controller (recalibration, bandwidth escalation, student refresh), and training-side error propagation across an unbounded sequence of Federated Probe-Logit Distillation (FPLD) refreshes via a summable training budget. As a practical consequence, an adaptive controller that escalates retrieval bandwidth only when the e-process crosses a warning threshold matches the alarm rate of a fixed-high-bandwidth schedule at substantially lower communication cost. Experiments on a GPT-2-small + MiniLM swarm across MMLU, DBpedia, and AG News verify the predicted alarm rate, detection delay, envelope coverage, and $14$-$57\%$ bandwidth savings; the alarm fires when and only when coverage genuinely breaks.

URL PDF HTML ☆

赞 0 踩 0

2605.29121 2026-05-29 math.DS cs.AI cs.LG

A Minimal Bifurcation Model of Load Imbalance in a Softmax Mixture-of-Experts Router

Softmax混合专家路由器中负载不平衡的最小分岔模型

O. M. Kiselev

AI总结提出一个两专家混合专家层的自适应softmax路由最小动力学模型，通过平均场极限从离散强化规则导出，发现超临界叉形分岔导致负载不平衡，并推导了分岔集和尖点灾变的精确参数方程。

Comments 21 pages, 11 figures

详情

AI中文摘要

我们提出了一个两专家混合专家（MoE）层的自适应softmax路由的最小动力学模型。该模型作为离散强化规则的平均场极限得到：被选中的专家获得小的分数增量，而所有分数经历正则化衰减。在对称情况下，极限系统具有超临界叉形分岔：对于弱反馈，存在唯一的稳定平衡状态，而当反馈强度超过临界值时，出现两个稳定的不对称状态。当加入外部不对称性时，叉形分岔展开为一对折叠分岔，在控制参数平面中形成一个尖点。我们推导了分岔集和尖点灾变的局部规范型的精确参数方程。数值实验将这一图景与经验专家负载、一个小的可训练MoE模型、硬top-1 PyTorch路由以及一个关于数字的小型分类实验联系起来。结果为自适应MoE路由器中负载不平衡的突然转变提供了一个可控的低维机制。

英文摘要

We propose a minimal dynamical model of adaptive softmax routing for a two-expert Mixture-of-Experts (MoE) layer. The model is obtained as a mean-field limit of a discrete reinforcement rule: the selected expert receives a small score increment, while all scores undergo regularizing decay. In the symmetric case the limiting system has a supercritical pitchfork bifurcation: for weak feedback there is a unique stable balanced state, whereas above a critical feedback strength two stable asymmetric states appear. When an external asymmetry is added, the pitchfork unfolds into a pair of fold bifurcations forming a cusp in the control-parameter plane. We derive exact parametric equations for the bifurcation set and the local normal form of the cusp catastrophe. Numerical experiments connect this picture to empirical expert load, a small trainable MoE model, hard top-1 PyTorch routing, and a small classification experiment on digits. The results provide a controlled low-dimensional mechanism for abrupt transitions to load imbalance in adaptive MoE routers.

URL PDF HTML ☆

赞 0 踩 0

2605.29115 2026-05-29 cs.CR cs.AI

unix-ctf: Procedural Environments for Unix-Competence Reinforcement Learning

unix-ctf: 用于Unix能力强化学习的过程化环境

Geoffrey Bradway, Roger Creus Castanyer, Lorenz Wolf, Maxwill Lin, Matthew James Sargent, Augustine N. Mavor-Parker

AI总结本文提出unix-ctf，一个过程化生成shell代理的夺旗任务的环境，通过LLM辅助合成管道生成可复用的隐藏-查找脚本对，并基于此微调Qwen3-8B模型，将解决率从11.6%提升至43.6%，证明Unix能力是可分离、可训练的。

详情

AI中文摘要

Unix能力是指将shell和操作系统原语作为一等工具使用的能力，而不仅仅是通过终端编写程序。当前的终端基准测试往往模糊了这一区别：一个精通Python但Unix能力薄弱的求解器可以通过Terminal-Bench 2.0的相当一部分，而反向技能组合则很少被锻炼。我们使这一区别可操作化，并为Unix组件构建训练表面。unix-ctf是一个为shell代理过程化生成夺旗任务的工具。每个任务使用单个Unix特性在一个新的Linux容器中隐藏一个短令牌（形如flag(a3b1c9...)的旗帜），代理必须恢复它。任务由LLM辅助的合成管道生成，该管道生成候选隐藏技术，将其重写为参数化的隐藏-查找脚本对，并通过双向契约进行过滤：隐藏脚本不得在磁盘上留下旗帜的明文痕迹，查找脚本必须在新目录中恢复旗帜。由于LLM仅编写植入和恢复步骤（容器、布局和评分框架是固定的），该管道在750次原始尝试中获得了656个可移植、可复用的变体（87.5%）。我们复现Endless Terminals的完整容器生成方法，在相同检查下仅获得17.4%。656个变体规范化为155种不同技术。使用GRPO在此表面上通过LoRA微调Qwen3-8B，将15技能多族保留集（n=225）上的解决率从11.6%提升至43.6%，重新分配了模型解决的InterCode-CTF任务，并在Forensics上获得+33个百分点的提升，同时在InterCode-CTF上达到32/100。这些结果表明，Unix能力是可分离、可训练的，最好直接评估，而不是将其融入通过终端的编程中。

英文摘要

Unix competence is the ability to use shell and operating-system primitives as first-class tools, not merely to write programs through a terminal. Current terminal benchmarks tend to blur this distinction: a solver fluent in Python but weak in Unix can pass a substantial fraction of Terminal-Bench 2.0, while the reverse skill profile is rarely exercised. We make the distinction operational and build a training surface for the Unix component. unix-ctf is a procedural generator of capture-the-flag tasks for shell agents. Each task hides a short token (a flag of the form flag(a3b1c9...)) inside a fresh Linux container using a single Unix feature, and the agent must recover it. Tasks are produced by an LLM-assisted synthesis pipeline that generates candidate hiding techniques, rewrites them into parameterized hide-and-find script pairs, and filters them with a bidirectional contract: the hide script must leave no plaintext trace of the flag on disk, and the find script must recover the flag in a fresh directory. Because the LLM only writes the planting and recovery steps (the container, layout, and grading harness are fixed), the pipeline lands 656 of 750 raw attempts as portable, reusable variants (87.5\%). Our reproduction of Endless Terminals' full-container-generation approach lands only 17.4\% under the same checks. The 656 variants canonicalize to 155 distinct techniques. Fine-tuning Qwen3-8B with LoRA using GRPO on this surface lifts solve rate from 11.6\% to 43.6\% on a 15-skill multi-family holdout (n=225), redistributes which InterCode-CTF tasks the model solves, and produces a +33 pp gain in Forensics while reaching 32/100 on InterCode-CTF. These results suggest that Unix competence is separable, trainable, and best evaluated directly rather than folded into programming-through-a-shell.

URL PDF HTML ☆

赞 0 踩 0

2605.29114 2026-05-29 cs.CR cs.LG cs.RO

ReasonBreak: Probing Vulnerabilities in Reasoning-Enabled Vision-Language-Action Models for Autonomous Driving

ReasonBreak: 探测自动驾驶中具备推理能力的视觉-语言-行动模型的脆弱性

Mohammadreza Teymoorianfard, Jean-Philippe Monteuuis, Jonathan Petit, Amir Houmansadr

AI总结本文通过黑盒攻击方法，首次系统研究了具备推理能力的视觉-语言-行动模型在自动驾驶中面对真实输入扰动时的脆弱性，发现其推理和轨迹生成均易受攻击，导致碰撞率上升。

详情

AI中文摘要

具备集成推理能力的视觉-语言-行动（VLA）模型已被提出用于端到端自动驾驶，假设推理与轨迹生成之间存在紧密耦合。然而，此类系统在真实输入扰动下的鲁棒性尚未得到充分探索。我们表明，这些模型对真实输入扰动高度脆弱，在闭环仿真中推理攻击成功率高达89%，轨迹操控攻击成功率高达72%，导致碰撞率上升和安全指标下降。以NVIDIA近期开发的Alpamayo模型为代表，我们首次对具备推理能力的VLA模型在真实文本输入损坏下进行了系统性黑盒研究，评估了其对推理和驾驶行为的影响。我们引入了一个推理感知评估框架，捕捉推理的语义和结构方面，并结合以安全为中心的度量。我们还引入了一个基准，用于评估自动驾驶中推理-轨迹交互的攻击与防御。我们的结果强调了严格评估和改进防御的必要性，以确保自动驾驶中具备推理能力的VLA系统的安全性。

英文摘要

Vision-Language-Action (VLA) models with integrated reasoning have been proposed for end-to-end autonomous driving, assuming a tight coupling between reasoning and trajectory generation. However, the robustness of such systems under realistic input perturbations remains largely unexplored. We show that these models are highly vulnerable to realistic input perturbations, achieving up to 89% attack success rate (ASR) on reasoning and up to 72% on trajectory manipulation in closed-loop simulation, leading to increased collision rates and degraded safety metrics. Using NVIDIA's recent Alpamayo models as representative industry-developed VLAs, we conduct the first systematic black-box study of reasoning-enabled VLA models under realistic textual input corruptions, evaluating their impact on reasoning and driving behavior. We introduce a reasoning-aware evaluation framework capturing both semantic and structural aspects of reasoning, along with safety-centric measures. We also introduce a benchmark for evaluating attacks and defenses on reasoning-trajectory interactions in autonomous driving. Our results highlight the need for rigorous evaluation and improved defenses to ensure the safety of reasoning-enabled VLA systems in autonomous driving.

URL PDF HTML ☆

赞 0 踩 0

2605.29063 2026-05-29 eess.IV cs.CV

Accelerating HEVC Intra Partitioning via a CNN-Hierarchical Attention Transformer Hybrid

通过CNN-分层注意力Transformer混合加速HEVC帧内划分

Krishna Kumar Sharma, Somdyuti Paul

AI总结提出HFViT混合架构，融合重参数化深度可分离卷积与分层注意力Transformer，以低复杂度实现高效全局信息传播，在HEVC帧内划分预测中降低VMAF BD-rate惩罚并保持低CPU延迟。

详情

AI中文摘要

高效视频编码（HEVC）中的递归四叉树划分带来了大量计算开销，其中针对CTU划分预测的穷举率失真优化消耗了编码时间的主要部分。尽管通过深度学习进行划分预测已成为一种可行的编码加速器，但架构上的二分法仍未得到充分解决：CNN计算效率高，但由于其局部有效感受野而空间短视，无法捕捉长程语义关系和重复纹理；相反，基于Transformer的架构更擅长捕捉全局上下文，但会带来过高的CPU延迟，这是阻碍其在主要CPU受限环境中部署的关键缺陷。本文介绍了混合快速视觉Transformer（HFViT），这是一种旨在加速HEVC帧内模式划分预测的混合架构。HFViT将重参数化的深度可分离卷积骨干与分层注意力Transformer（HAT）机制融合，利用载体令牌方案以次二次复杂度实现高效的全局信息传播。训练后的结构融合将批归一化折叠到前一层，以进一步减少延迟。全面评估揭示了HFViT在跨分辨率加速HEVC帧内编码方面的有效性。在标准JCT-VC测试序列上，与竞争的ETH-CNN基线相比，HFViT在A、B和E类上分别将平均VMAF BD-rate惩罚降低了2.4、2.6和7.9个百分点，同时将CPU推理延迟维持在CNN基线的8%以内，并在GPU上超越其40%，为实时编码器集成建立了实际可行性。

英文摘要

The recursive quad-tree partitioning in High Efficiency Video Coding (HEVC) incurs considerable computational overhead, with exhaustive rate-distortion optimization for CTU partition prediction consuming the dominant share of encoding time. Although partition prediction through deep learning has emerged as a viable encoding accelerator, an architectural dichotomy remains largely unaddressed: CNNs are computationally efficient but spatially myopic due to their localized effective receptive fields, failing to capture long range semantic relationships and repetitive textures; conversely, transformer based architectures are better at capturing global context but incur prohibitive CPU latency, a critical liability that impedes deployment which is predominantly CPU-bound. This paper introduces Hybrid Fast Vision Transformer (HFViT), a hybrid architecture designed to accelerate HEVC intra-mode partition prediction. HFViT fuses a reparameterized depthwise-separable convolutional backbone with a Hierarchical Attention Transformer (HAT) mechanism, leveraging a carrier token scheme to enable efficient global information propagation at sub-quadratic complexity. Post-training structural fusion collapses batch normalization into preceding layers to further reduce latency. Comprehensive evaluation reveals the efficacy of HFViT in accelerating HEVC intra-encoding across resolutions. On standard JCT-VC test sequences, HFViT reduces the average VMAF BD-rate penalty by 2.4, 2.6, and 7.9 percentage points on Classes A, B and E, respectively, as compared to the competing ETH-CNN baseline while maintaining CPU inference latency within 8% of the CNN baseline and surpassing it on GPU by 40%, establishing practical viability for real-time encoder integration.

URL PDF HTML ☆

赞 0 踩 0

2605.29059 2026-05-29 cs.SE cs.AI cs.CR

SCDBench: A Benchmark for LLM-Based Smart Contract Decompilers

SCDBench: 基于大语言模型的智能合约反编译基准

Kaihua Qin, Dawn Song, Arthur Gervais

AI总结针对现有智能合约反编译评估缺乏统一基准的问题，提出SCDBench数据集与评估方法，通过四阶段累积评估（格式完整性、可编译性、ABI恢复、语义一致性）测试前沿LLM的反编译能力，发现语义一致性仍远未解决。

详情

AI中文摘要

智能合约反编译旨在从字节码恢复高级源代码，但评估反编译器仍然困难，因为现有研究使用狭窄的数据集、不一致的度量标准和有限的语义一致性检查。随着大语言模型（LLMs）开始生成类似源代码的Solidity代码，这些代码可能编译通过并看似合理，即使其语义与原始合约存在偏差，这一差距变得日益重要。我们引入了SCDBench，一个用于基于LLM的智能合约反编译的数据集和基准方法。该数据集包含600个真实世界的Solidity合约，配有配对的字节码输入、真实源代码和可重放的语义检查点。SCDBench通过四个累积阶段评估反编译器的输出：格式完整性、可编译性、应用程序二进制接口（ABI）恢复以及通过差分重放实现的语义一致性。我们在零样本反编译设置中评估了Claude Opus 4.7、GPT-5.3-Codex和GLM-5，包括具有和不具有扩展推理的GLM-5变体，以及零样本编译修复设置。结果表明，前沿LLM通常能够生成结构化和可编译的Solidity代码，但实现语义一致性仍远未解决：表现最好的前沿模型仅完美反编译了42/600个合约。我们进一步表明，引入同模型编译修复在适度增加成本的情况下显著提升了性能。SCDBench为严格、可重复的评估建立了共同基础，旨在加速开发用于区块链安全性和透明度的可靠智能合约反编译器。

英文摘要

Smart contract decompilation aims to recover high-level source code from bytecode, but evaluating decompilers remains difficult because existing studies use narrow datasets, inconsistent metrics, and limited semantic consistency checks. This gap is increasingly important as large language models (LLMs) begin to generate source-like Solidity that may compile and appear plausible, even when its semantics diverge from the original contract. We introduce SCDBench, a dataset and benchmark methodology for LLM-based smart contract decompilation. The dataset contains 600 real-world Solidity contracts with paired bytecode inputs, ground-truth source code, and replayable semantic checkpoints. SCDBench evaluates decompiler outputs through four cumulative stages: format completeness, compilability, Application Binary Interface (ABI) recovery, and semantic consistency via differential replay. We evaluate Claude Opus 4.7, GPT-5.3-Codex, and GLM-5 in a zero-shot decompilation setting, including GLM-5 variants with and without extended reasoning and a zero-shot compilation-repair setting. The results show that frontier LLMs can often produce structured and compilable Solidity, but achieving semantic consistency remains far from solved: the best-performing frontier model perfectly decompiles only 42/600 contracts. We further show that introducing same-model compilation repair substantially improves performance at modest additional cost. SCDBench establishes a common ground for rigorous, reproducible evaluation and aims to accelerate the development of reliable smart contract decompilers for blockchain security and transparency.

URL PDF HTML ☆

赞 0 踩 0

2605.29016 2026-05-29 astro-ph.IM astro-ph.CO cs.LG

高维稀疏更新下随机动量的动力学

Katie Everett, Elliot Paquette

AI总结本文通过最小二乘和逻辑回归模型，理论分析了稀疏更新下动量的动力学，揭示了由动量保留时间尺度与学习时间尺度之比决定的相结构，并发现不同令牌稀疏度下的振荡动力学存在谱冲突。

详情

AI中文摘要

现有的动量理论假设梯度以大致恒定的速率到达每个参数，但这一假设在重尾数据分布和现代架构中常被违反。我们理论分析了稀疏更新下两种可处理动量模型的动力学：具有稀疏输入的最小二乘模型和具有稀有类别的逻辑回归模型。两者都给出了精确的闭式二阶矩动力学，我们针对稀疏性、批量大小和动量衰减的三个标度指数刻画了其高维极限。两个问题上的相结构由两个内在时间尺度之比决定：动量保留时间尺度（缓冲区存活的活动更新次数）和学习时间尺度（减少平方误差所需的活动更新次数）。当学习远慢于保留时，极限匹配SGD；当学习更快时，系统不稳定；当时间尺度相当时，我们恢复经典的重球动力学。振荡动力学发生在不同令牌稀疏度的不同动量值处，从而在全局动量上产生跨令牌频率的谱冲突。

英文摘要

Existing theory of momentum assumes that gradients arrive at every parameter at a roughly constant rate, an assumption violated in practice by heavy-tailed data distributions and modern architectures. We theoretically analyze the dynamics of two tractable models of momentum under sparse updates: a least squares model with sparse inputs and a logistic regression model with a rare class. Both admit exact closed-form second-moment dynamics whose high-dimensional limits we characterize across three scaling exponents for sparsity, batch size, and momentum decay. The phase structure on both problems is governed by the ratio of two intrinsic timescales: a momentum retention timescale (how many active updates the buffer survives) and a learning timescale (how many active updates it takes to reduce the squared error). When learning is much slower than retention, the limit matches SGD; when learning is faster, the system is unstable; where the timescales coincide, we recover classical heavy-ball dynamics. The oscillatory dynamics occur at different momentum values for different token sparsity, creating a spectral conflict for global momentum across token frequencies.

URL PDF HTML ☆

赞 0 踩 0

2605.28940 2026-05-29 hep-ph cs.LG hep-ex physics.data-an

Neural Scaling Laws for Jet Generation

喷注生成的神经缩放定律

Oz Amram, Darius A. Faroughy, Tjarko Gerdes, Anna Hallin, Gregor Kasieczka, Michael Krämer, Humberto Reyes-Gonzalez, David Shih

AI总结本文首次探索粒子喷注生成任务中的缩放定律，发现模型大小缩放遵循对数定律，并证明下一个标记预测验证损失与物理性能单调相关。

详情

AI中文摘要

最近观察到的经验缩放定律描述了基础模型在三个独立关键量（数据集大小、计算量和模型参数）变化时的性能。提取这些缩放定律有助于训练大型复杂模型，因为传统方式调优超参数不可行。本文首次探索缩放定律是否也适用于粒子喷注生成任务——该任务既作为基础模型的预训练目标，也作为原位模拟本身。我们确实复制了模型大小缩放的关键对数缩放定律行为。除了研究生成模型的下一个标记预测验证损失，我们还研究了五个物理量的切片Wasserstein距离，这些物理量在训练期间模型无法直接获得。我们的研究表明，该量与下一个标记预测验证损失单调相关，意味着该损失确实是物理性能的良好代理。对于数据集大小和计算量的缩放，我们观察到损失和切片Wasserstein距离的缩放行为明显较弱。我们通过引入可学习窗口的概念分析这种行为，并认为喷注成分的自回归下一个标记预测相对于语言模型研究表现出较快的饱和。我们讨论了这种行为的可能起源，包括QCD辐射的随机性以及生成式与监督式学习任务在碰撞物理中的差异。

英文摘要

Recently observed empirical scaling laws describe the performance of foundation-type models as three independent key quantities -- dataset size, compute, and model parameters -- are modified. Extracting these scaling laws informs the training of large complex models for which the tuning of hyperparameters in traditional ways is not feasible. This work for the first time explores if scaling laws can also be observed for the task of particle jet generation -- both relevant as a pre-training objective for foundation models and as in-situ simulation by itself. We indeed replicate the key logarithmic scaling law behavior for model-size scaling. Beyond studying the next token prediction validation loss of the generative model, we also study the sliced Wasserstein distance of five physical quantities that are not immediately available to the model during training. Our study shows that this quantity is monotonically related to the next token prediction validation loss, meaning that this loss is indeed a good proxy for the physics performance. For the scaling with dataset size and compute, we observe substantially weaker scaling behavior of both the loss and the sliced Wasserstein distance. We analyze this behavior by introducing the concept of a learnable window, and argue that autoregressive next token prediction on jet constituents exhibits comparatively rapid saturation relative to language-model studies. We discuss possible origins of this behavior, including the stochastic nature of QCD radiation and differences between generative and supervised learning tasks in collider physics.

URL PDF HTML ☆

赞 0 踩 0

2605.28914 2026-05-29 cs.CR cs.AI

LogDx-CI：为LLM根因诊断基准测试日志缩减工具

Bowen Qin

AI总结提出LogDx-CI基准，比较11种日志缩减工具在35个真实CI故障案例上的效果，发现混合grep+tail路由器在成本质量上占优，且智能体循环可缩小质量差距但成本差异持续存在，同时跨家族LLM摘要器优于同家族。

详情

AI中文摘要

CI失败日志规模大（本语料中位数5000行，最大20万行）且噪声多。尝试调试的编码智能体依赖上游工具将日志缩减为可管理的上下文，但该领域缺乏公开的经验比较来评估哪些缩减能为下游LLM诊断保留足够证据。我们引入LogDx-CI基准，比较11种上下文缩减工具（原始、尾部、grep、三种RTK模式、两种真实LLM map-reduce摘要器、三种混合路由器）在35个真实GitHub Actions失败案例上的表现，由3个LLM调试器家族（Claude Haiku 4.5、Claude Sonnet 4.6、OpenAI gpt-5-mini）以及一个Sonnet 4.6工具使用智能体评分。我们报告三个重要发现。（1）混合grep+tail路由器主导成本-质量帕累托前沿；前两种方法得分0.670/0.666，每案例约0.03美元，质量与独立grep相当但令牌数减少4.5倍。（2）在智能体循环场景中，不同缩减工具的质量范围缩小7倍（单次得分跨度0.42 → 智能体循环跨度0.059）；智能体通过后续工具调用挽救弱上下文。然而，成本差异持续存在：弱上下文迫使智能体发出2-4倍的工具调用来恢复。（3）跨家族LLM摘要-调试器对（gpt-5-mini摘要器供给Claude Haiku调试器）在四个诊断变体上的平均得分比同家族对高0.071，否定了该任务上的自我调用偏差假设。gpt-5-mini摘要器也是智能体循环中的第一名方法（得分0.749），每案例0.37次工具调用，且缩减器成本比Haiku摘要器低10倍（每案例0.18美元 vs 1.75美元）。所有数据、代码、每个案例的捆绑包和可复现性基础设施均已公开。

英文摘要

CI failure logs are large (median 5k lines, max 200k in this corpus) and noisy. Coding agents that try to debug them depend on an upstream tool to reduce the log to a manageable context, but the field has had no public empirical comparison of which reductions preserve enough evidence for downstream LLM diagnosis. We introduce LogDx-CI, a benchmark that compares 11 context-reduction tools (raw, tail, grep, three RTK modes, two real LLM map-reduce summarizers, three hybrid routers) on 35 real GitHub Actions failure cases, scored by 3 LLM debugger families (Claude Haiku 4.5, Claude Sonnet 4.6, OpenAI gpt-5-mini) plus a Sonnet 4.6 tool-using agent. We report three load-bearing findings. (1)~Hybrid grep+tail routers dominate the cost-quality Pareto frontier; the top two methods score 0.670 / 0.666 at $\sim$ \$0.03 per case, same-ballpark quality as standalone grep at $4.5\times$ fewer tokens. (2)~In the agent-loop regime, the quality range across reduction tools collapses $7\times$ (single-shot spread 0.42 $\to$ agent-loop spread 0.059); the agent rescues weak contexts via follow-up tool calls. However, cost differences persist: weak contexts force the agent to issue 2--4$\times$ more tool calls to recover. (3)~A cross-family LLM-summary pair (gpt-5-mini summarizer feeding a Claude Haiku debugger) beats the same-family pair by $+0.071$ averaged across four diagnoser variants, falsifying the self-call-bias hypothesis on this task. The gpt-5-mini summarizer is also the agent-loop \#1 method (score 0.749) at $0.37$ tool-calls per case and $10\times$ lower reducer cost than the Haiku summarizer (\$0.18 vs \$1.75 per case). All data, code, per-case bundles, and reproducibility infrastructure are public.

URL PDF HTML ☆

赞 0 踩 0

2605.28861 2026-05-29 cond-mat.str-el cond-mat.dis-nn cs.LG

Comment on "Spin-1/2 Kagome Heisenberg Antiferromagnet: Machine Learning Discovery of the Spinon Pair-Density-Wave Ground State"

评论：自旋-1/2 Kagome海森堡反铁磁体：通过机器学习发现自旋子对密度波基态

Helia Kamal, Dominik Kufel, DinhDuy Vu, Chris R. Laumann, Norman Y. Yao

AI总结指出使用群等变卷积神经网络研究kagome海森堡反铁磁体基态时，由于Metropolis-Hastings采样中单自旋翻转更新导致遍历性破缺，使得报告的低能态是伪影，而采用自旋交换更新后网络收敛能量高于DMRG结果，质疑原文结论。

Comments 3 pages, 1 figure; Comment on arXiv:2401.02866

详情

AI中文摘要

最近的一篇文章[Phys. Rev. X 15, 011047 (2025)]利用群等变卷积神经网络研究了kagome海森堡反铁磁体的基态。在迄今为止研究的最大的有限尺寸团簇（$N=108$）上，作者报告了显著低于其他数值方法（包括最先进的密度矩阵重正化群（DMRG）计算）的变分能量。与先前暗示可能存在自旋液体基态的结果相反，作者观察到了自旋子对密度波基态。我们发现：（i）报告的低能量是Metropolis-Hastings采样中遍历性破缺的伪影，因为作者使用的单自旋翻转更新规则实际上冻结了马尔可夫链；（ii）当通过自旋交换更新强制执行遍历采样时，神经网络收敛到显著高于现有DMRG结果的能量，这使该论文的主张受到质疑。

英文摘要

A recent article [Phys. Rev. X 15, 011047 (2025)] utilizes group-equivariant convolutional neural networks to study the ground state of the kagome Heisenberg antiferromagnet. On the largest finite-size cluster studied to date ($N=108$), the authors report variational energies significantly lower than other numerical methods, including state-of-the-art density matrix renormalization group (DMRG) calculations. In contrast to previous results suggesting a possible spin-liquid ground state, the authors observe a spinon pair-density-wave ground state. We find that: (i) the reported low energies are artifacts of broken ergodicity in the Metropolis--Hastings sampling, since the single-spin-flip update rule utilized by the authors effectively freezes the Markov chains; and (ii) when ergodic sampling is enforced via spin-exchange updates, the neural network converges to energies significantly higher than existing DMRG results, calling the paper's claims into question.

URL PDF HTML ☆

赞 0 踩 0

2605.28858 2026-05-29 cs.CE cs.LG math-ph math.MP

An End-to-End PyTorch Interface for Differentiable PDE Solvers: A RANS Model-Correction Study

可微PDE求解器的端到端PyTorch接口：一项RANS模型校正研究

Luca Saverio, Michele Alessandro Bucci, Gianmarco Farro, Cédric Content, Denis Sipp

AI总结提出一个端到端可微机器学习框架，通过将PDE作为隐层集成到PyTorch中，优化参数化校正项，用于数据同化和闭合建模，并在可压缩流RANS方程上验证。

详情

AI中文摘要

本工作提出了一种在完全可微的机器学习框架内求解偏微分方程约束反问题的端到端策略。所提出的公式提供了一种统一且用户友好的方法，适用于从数据同化到闭合建模的广泛问题。我们的方法结合了一个基线可微PDE求解器（从非线性系统$R(w) = 0$预测状态$w$）和一个通用的加性、参数化、可微校正$f_ϕ(w)$，其可训练参数为$ϕ$。我们展示了如何通过将PDE重新表述为隐层，将其集成到任意目标函数中，同时利用PyTorch的自动微分图，在完全可微的Python工作流中优化phi。该方法在可压缩流的雷诺平均纳维-斯托克斯方程上进行了演示，其中闭合项或其一部分使用可训练参数或神经网络建模。第一个应用考虑了二维NASA壁装驼峰测试案例，其中生产项参数针对时间平均LES数据进行了优化。第二个应用在VKI LS-59涡轮叶片上进行，其中通过优化可训练空间场重建了Spalart-Allmaras涡粘性场。使用可微BROADCAST求解器和Spalart-Allmaras湍流模型，从VKI LS-59涡轮叶片几何形状生成数据集。结果突出了该框架的灵活性，展示了其超越湍流建模，适用于更广泛的物理信息PDE约束问题（具有数据驱动组件）的适用性。

英文摘要

This work presents an end-to-end strategy for solving inverse problems constrained by Partial Differential Equations within a fully differentiable Machine Learning framework. The proposed formulation provides a unified and user-friendly methodology applicable to a wide range of problems, from data assimilation to closure modeling. Our approach combines a baseline differentiable PDE solver, which predicts the state w from the nonlinear system $R(w) = 0$, with a generic additive, parametrized, and differentiable correction $f_ϕ(w)$, with trainable parameters $ϕ$. We show how to optimize phi within a fully differentiable Python workflow by reformulating the PDE as an implicit layer, enabling its integration into arbitrary objective functions, while leveraging PyTorch's automatic differentiation graph. The method is demonstrated on the Reynolds-Averaged Navier-Stokes equations for compressible flows, where the closure term, or a portion of it, is modeled using trainable parameters or a Neural Network. The first application considers the 2D NASA Wall-Mounted Hump test case, where a production-term parameter is optimized against time-averaged LES data. A second application is carried out on the VKI LS-59 turbine blade, where the Spalart-Allmaras eddy viscosity field is reconstructed through the optimization of a trainable spatial field. A dataset is generated starting from the VKI LS-59 turbine blade geometry using the differentiable BROADCAST solver with the Spalart-Allmaras turbulence model. The results highlight the flexibility of the framework, showing its applicability beyond turbulence modeling to a broader class of physics-informed PDE-constrained problems with data-driven components.

URL PDF HTML ☆

赞 0 踩 0

2605.28853 2026-05-29 q-fin.PM cs.LG

Financially Guided Deep Portfolio Optimization

财务引导的深度投资组合优化

Rahul Fernandes, Travis Desell

AI总结提出一个端到端框架，通过直接优化夏普比率、Omega比率、条件风险价值(CVaR)和风险平价等关键财务指标的微分代理，利用神经网络学习投资组合权重，在2007-2023年50只标普500股票上，最佳模型(AttentionLSTM结合Omega-CVaR-RiskParity损失)在2022-2023年样本外测试中实现年化夏普比率0.29和总复合收益+7.86%，超越标普500指数12.38个百分点。

详情

AI中文摘要

由于非平稳性、噪声数据和高交易成本，现实金融市场中的投资组合优化极其困难。标准的预测-然后优化方法首先预测收益，然后求解权重，这加剧了预测误差，并且常常在制度转换下失败。我们提出一个端到端框架，直接优化关键财务指标——夏普比率、Omega比率、条件风险价值(CVaR)和风险平价——的可微代理，使得神经网络能够通过反向传播学习投资组合权重。我们的扩展窗口滚动前向程序，应用于2007年至2023年的50只标普500股票，包含了现实的买卖价差成本，并每季度再平衡。在具有挑战性的样本外测试期（2022-2023年），最佳模型——使用Omega-CVaR-RiskParity损失的AttentionLSTM——实现了年化夏普比率0.29和总复合收益+7.86%，而标普500指数总收益为-4.52%，年化夏普比率为-0.02。这比标普500指数高出12.38个百分点（相对改进超过270%），同时保持尾部风险（CVaR）几乎不变。该框架持续优于等权重投资组合、标普500指数以及传统方法（MVP、HRP、NCO），表明将财务目标直接嵌入模型训练能够在不利市场条件下产生稳健、经济上有意义的超额收益。

英文摘要

Portfolio optimization in real-world financial markets is notoriously difficult due to non-stationarity, noisy data, and high transaction costs. Standard predict-then-optimize methods first forecast returns and then solve for weights, compounding prediction errors and often failing under regime shifts. We propose an end-to-end framework that directly optimizes differentiable surrogates of key financial metrics - Sharpe ratio, Omega ratio, Conditional Value-at-Risk (CVaR), and Risk Parity - allowing neural networks to learn portfolio weights via backpropagation. Our expanding-window walk-forward procedure, applied to 50 S&P 500 stocks from 2007 to 2023, incorporates realistic bid-ask spread costs and rebalances quarterly. On the challenging out-of-sample test period (2022-2023), the best model - an AttentionLSTM with the Omega-CVaR-RiskParity loss - achieves an annualized Sharpe of 0.29 and a total compounded return of +7.86%, while the S&P 500 delivers -4.52% total return and an annualized Sharpe of -0.02. This outperforms the S&P 500 by 12.38 percentage points (a relative improvement of over 270%), while keeping tail risk (CVaR) nearly unchanged. The framework consistently outperforms the equal-weight portfolio, S&P 500, and traditional methods (MVP, HRP, NCO), demonstrating that embedding financial objectives directly into model training yields robust, economically meaningful outperformance even in adverse market conditions.

URL PDF HTML ☆

赞 0 踩 0

2605.28851 2026-05-29 astro-ph.EP astro-ph.IM cs.LG physics.ao-ph

Towards a Foundation Model for the Martian Atmosphere

火星大气基础模型

Sujit Roy, Udayshankar Nair, Yuling Wu, Georgios Priftis, Liping Wang, Anastasia Georgiou, Anne Jones, Björn Lütjens, Johannes Schmude, Campbell Watson, Rachel A. Slank, Ankur Kumar, Anirbit Mukherjee, Procheta Sen, Ramin Lolachi, Haonan Chen, Manil Maskey, Juan Bernabé-Moreno, Rahul Ramachandran

AI总结针对火星大气数据稀疏、计算成本高等挑战，本文探讨了构建数据驱动基础模型的设计空间，包括可用数据、物理模型、下游应用及AI方法。

详情

AI中文摘要

火星大气中存在从行星尺度沙尘暴到中尺度地形云和夜间低空急流等动力学现象。全球环流模型能够模拟这些现象，但在解析中尺度特征所需的分辨率下计算成本高昂。虽然卫星遥感观测的同化使得利用此类模型进行预报成为可能，但观测记录通常稀疏、短暂且分散在不同仪器代际之间。这些限制促使我们开发数据驱动的火星大气基础模型。基础模型处于复杂的设计空间中。可用数据、底层过程的物理特性以及人工智能的相应发展之间存在相互作用。尽管基础模型旨在以数据和计算高效的方式处理多个用例，但明确单个模型能够合理解决哪些应用至关重要。本文旨在阐明这一设计空间。我们讨论了从大气反演到再分析数据集以及现有物理模型的可用数据。此外，我们识别了广泛的候选下游应用。最后，我们考虑了在此背景下可以利用的人工智能（AI）相关最新进展。这里，我们特别关注用于大气物理的AI模型、数据驱动的数据同化方法以及在有限数据环境下工作的技术。

英文摘要

The martian atmosphere hosts dynamical phenomena ranging from planet-encircling dust storms to mesoscale orographic clouds and nocturnal low-level jets. General circulation model show capability to simulate these phenomena, but is computationally expensive at resolution needed to resolve mesoscale features. While assimilation of satellite remote sensing observation enable forecasting capabilities using such models, observation record is often sparse, short and fragmented across instrument generators. These constraints motivate the development of a data-driven foundation model for the Martian atmosphere. Foundation models live in a complex design landscape. There is an interplay between the available data, the physics of the underlying processes and corresponding developments in AI. Even though the idea of a foundation model is to address multiple use cases in a data- and compute-efficient manner, it is important to have a clear picture what applications can sensibly addressed by a single model. The purpose of this paper is to elucidate this design landscape. We discuss available data ranging from atmospheric retrievals to reanalysis datasets as well as existing physical models. Moreover, we identify a wide range of candidate downstream applications. Finally, we consider relevant recent developments in artificial intelligence (AI) that can be leveraged in this context. Here, we put a particular emphasis on AI models for atmospheric physics, data-driven approaches to data assimilation as well as methods to work in a limited data setting.

URL PDF HTML ☆

赞 0 踩 0

2605.28844 2026-05-29 cs.NE cs.LG

WASHH: An Anchor-Aware Whale-Guided Selection Hyper-Heuristic for Continuous Optimization and SVC Configuration

WASHH：一种用于连续优化和SVC配置的锚点感知鲸鱼引导选择超启发式算法

Yifu Zhao, Xiaofan Zou, Junhao Wei, Yanxiao Li, Baili Lu, Zhenhong Peng, Dexing Yao, Haochen Li, Qinbin He, Sio-Kei Im, Xu Yang, Yapeng Wang

AI总结提出WASHH超启发式算法，通过在线奖励控制器选择多种搜索行为，在连续优化和SVC超参数配置中取得最优平均排名和最低验证损失。

详情

AI中文摘要

学习辅助的算法设计通常必须在小的评估预算下做出可靠的搜索决策，而仅依赖单一元启发式算法可能不可靠。我们提出了WASHH，一种用于连续黑箱优化的鲸鱼引导自适应选择超启发式算法。WASHH使用WOA作为主要开发骨干，但将PSO风格记忆、GWO风格领导者平均、DE风格变异、局部坐标搜索和锚点引导细化视为可选择的搜索行为。在线奖励控制器根据观察到的改进分配评估，而锚点细化利用廉价参考配置（如箱中心或默认模型设置），而不绕过黑箱评估。在10个30维基准函数上，进行10次独立运行和12,000次评估，WASHH实现了最佳平均排名1.10，并在所有10个函数上达到最佳或并列最佳。它在8个函数上严格优于WOA，并在Rastrigin和Griewank函数上与WOA在数值最优值上持平。我们进一步研究了在300次评估预算下乳腺癌诊断的SVC超参数配置。WASHH在比较的优化器中获得了最低的平均验证对数损失，表明锚点感知选择超启发式算法是LEAD系统的一种实用轻量级方向。

英文摘要

Learning-assisted algorithm design often has to make reliable search decisions under small evaluation budgets, where committing to a single metaheuristic can be unreliable. We propose WASHH, a Whale-guided Adaptive Selection Hyper-Heuristic for continuous black-box optimization. WASHH uses WOA as the main exploitation backbone, but treats PSO-style memory, GWO-style leader averaging, DE-style variation, local coordinate search, and anchor-guided refinement as selectable search behaviors. An online reward controller allocates evaluations according to observed improvements, while anchor refinement exploits inexpensive reference configurations such as box centers or default model settings without bypassing black-box evaluation. On ten 30-dimensional benchmark functions with 10 independent runs and 12,000 evaluations, WASHH achieves the best average rank, 1.10, and is best or tied best on all ten functions. It strictly improves over WOA on eight functions and ties WOA at the numerical optimum on Rastrigin and Griewank. We further study SVC hyperparameter configuration for breast cancer diagnosis under a 300-evaluation budget. WASHH obtains the lowest mean validation log loss among the compared optimizers, suggesting that anchor-aware selection hyper-heuristics are a practical lightweight direction for LEAD systems.

URL PDF HTML ☆

赞 0 踩 0