arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.31065 2026-06-01 eess.SP cs.AI

DRIFT: Joint Channel Estimation and Prediction Towards Pilotless 6G Non-Terrestrial Networks

DRIFT：面向无导频6G非地面网络的联合信道估计与预测

Bruno De Filippo, Carla Amatetti, Alessandro Vanelli-Coralli

AI总结针对6G低轨卫星网络中导频开销大和星载计算受限的问题，提出一种轻量级联合信道估计与预测框架DRIFT，通过仅在初始时隙发送导频并利用数据驱动处理后续时隙，在低计算复杂度下实现高达12%的频谱效率提升。

详情

Comments: Submitted for publication

AI中文摘要

非地面网络（NTN）有望通过实现无处不在的连接和大规模通信，在第六代（6G）系统中发挥关键作用。在此背景下，信道预测成为一项关键技术，通过限制导频开销来提高频谱利用效率。然而，许多基于人工智能（AI）的预测器具有高推理复杂度，给星载实现带来挑战。本文针对低地球轨道（LEO）NTN，在严格功率约束限制模型复杂度的情况下，设计了精确且计算高效的信道预测技术，以实现频谱效率增益。我们提出了一种面向6G NTN的迭代联合信道估计与预测框架，通过仅在初始时隙传输导频，并在后续时隙依赖数据驱动处理，显著降低了导频开销。我们引入了DRIFT（无线信道跟踪的数据驱动细化与迭代预测），这是一种轻量级架构，以低计算成本和减少的误差传播来细化数据辅助的信道估计并预测未来的信道频率响应。研究了基于卷积层和长短期记忆层的两种预测器变体。在上行链路LEO NTN场景的端到端仿真中，结果表明，与传统基于导频的系统相比，所提方法实现了高达12%的频谱效率增益，对训练-测试不匹配具有鲁棒性，并在不同信道模型下保持一致的性能。此外，DRIFT所需的乘加运算少于20万次，使其适用于严格功率约束下的星载实现。

英文摘要

Non-terrestrial networks (NTNs) are expected to play a pivotal role in sixth-generation (6G) systems by enabling ubiquitous connectivity and massive communication. In this context, channel prediction emerges as a key technique to improve the spectrum utilization efficiency by limiting the pilot overhead. However, many proposed predictors based on artificial intelligence (AI) are characterized by high inference complexity, posing challenges to onboard implementation. In this paper, we address the challenge of designing accurate yet computationally efficient channel prediction techniques tailored to low Earth orbit (LEO) NTNs, where strict power constraints limit model complexity, to enable spectral efficiency gains. We propose an iterative joint channel estimation and prediction framework in the context of 6G NTNs that significantly reduces pilot overhead by transmitting pilots only in the initial slot and relying on data-driven processing for subsequent slots. We introduce Data-driven Refinement and Iterative Forecast for wireless channel Tracking (DRIFT), a lightweight architecture that refines data-aided channel estimates and predicts future channel frequency responses with low computational cost and reduced error propagation. Two predictor variants based on convolutional and long short-term memory layers are investigated. Simulation results in an end-to-end simulation of an uplink LEO NTN scenario show that the proposed approach achieves up to 12% spectral efficiency gain compared to conventional pilot-based systems, with robustness to training-test mismatches and consistent performance across different channel models. Moreover, DRIFT requires fewer than 200k multiply-accumulate operations, making it suitable for on-board satellite implementation under stringent power constraints.

URL PDF HTML ☆

赞 0 踩 0

2605.31064 2026-06-01 cs.IR cs.AI

Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

通过数据为中心的编译对抗在线金融问答中的数值幻觉

Hao Chen, Xing Tang, Qirui Liu, Weijie Shi, Shiwei Li, Fuyuan Lyu, Weihong Luo, Xiku Du, Xiuqiang He

AI总结提出数据为中心推理编译器（DCRC），通过对抗数据构建、多阶段训练和编译执行推理流程，解决在线金融问答中检索增强生成面临的噪声敏感、计算脆弱和可审计性危机，实现可靠的数值推理。

详情

Comments: Accepted by KDD 2026 ADS track

AI中文摘要

大型语言模型（LLMs）显著推进了在线数据服务，特别是在金融问答（FinQA）领域。然而，此类系统仍然容易受到数值推理幻觉的影响，这在高风险金融应用中严重损害了可靠性。尽管检索增强生成（RAG）已被广泛采用以将响应基于外部知识，但它引入了三个持续挑战：噪声敏感性、计算脆弱性和可审计性危机。现有的以模型为中心的方法主要侧重于单独优化检索器或生成器，仍然难以以集成方式解决这些问题。在这项工作中，我们开创了一种以数据为中心的范式，并提出了一个新颖的框架——数据为中心推理编译器（DCRC）。该框架通过三个连贯的阶段运作：（1）对抗数据构建，合成带有受控噪声的训练示例以教授鲁棒性；（2）多阶段训练，培养一个能够进行显式证据审计和程序合成的数据为中心结构化代理（DSA）；（3）编译并执行推理过程，其中DSA将用户查询和检索到的文档转换为可验证、可执行的推理程序。这种数据驱动的框架通过设计确保了忠实的数值推理。我们在已建立的离线基准上进行了大量实验，并通过在实际在线金融问答系统中的部署进一步验证了我们的框架。

英文摘要

Large Language Models (LLMs) have significantly advanced online data services, particularly in the domain of financial question answering (FinQA). However, such systems remain susceptible to numerical reasoning hallucinations, which critically undermine reliability in high-stakes financial applications. Although retrieval-augmented generation (RAG) has been widely adopted to ground responses in external knowledge, it introduces three persistent challenges: noise sensitivity, calculation fragility, and an auditability crisis. Existing model-centric approaches, which primarily focus on optimizing either the retriever or generator in isolation, still struggle to address these issues in an integrated manner. In this work, we pioneer a data-centric paradigm and propose a novel framework, the Data-centric Reasoning Compiler (DCRC). The framework operates through three cohesive phases: (1) adversarial data construction, which synthesizes training examples with controlled noise to teach robustness; (2) multi-stage training that cultivates a Data-centric Structuring Agent (DSA) capable of explicit evidence auditing and program synthesis; and (3) a compile-and-execute inference process, where the DSA transforms user queries and retrieved documents into verifiable, executable reasoning programs. This data-driven framework ensures faithful numerical reasoning by design. We conduct extensive experiments on established offline benchmarks and further validate our framework through deployment in a real-world online financial QA system.

URL PDF HTML ☆

赞 0 踩 0

2605.31063 2026-06-01 stat.ML cs.LG physics.chem-ph physics.comp-ph

Free energy Estimation on Any State Space

任意状态空间上的自由能估计

Jiajun He, Zijing Ou, Francisco Vargas, Yingzhen Li, José Miguel Hernández-Lobato, Carles Domingo-Enrich, Yuanqi Du

AI总结提出一种基于广义神经传输学习的框架，将自由能估计推广到任意状态空间，并揭示时间反演与Doob h-变换的群论结构。

详情

AI中文摘要

自由能估计是一个从物理学到统计学的基础且具有挑战性的问题。经典方法依赖于热力学变换，包括直接估计、准静态积分和有限时间平均。最近的工作[He and Du et al., 2025]通过学习神经传输显著加速了有限时间区间的效率。在本文中，我们将此框架推广到任意状态空间。基于这一观点，我们开发了一种广义神经传输学习方法以实现高效估计。实验验证了所提方法在连续设置之外的有效性和效率，扩展到离散和多模态空间以及自回归设置。除了自由能估计，我们还建立了代数恒等式并揭示了连接无穷小时间反演和广义Doob h-变换的群论结构，表明它们的组合形成一个广义二面体群。

英文摘要

Free energy estimation is a fundamental yet challenging problem, from physics to statistics. Classical approaches rely on thermodynamic transformations, ranging from direct estimation, quasistatic integration, to finite-time averaging. Recent work [He and Du et al., 2025] learns neural transports to significantly accelerate the efficiency in the finite-time regime. In this paper, we generalize this framework to arbitrary state spaces. Building on this view, we develop a generalized neural transport learning approach for efficient estimation. Experiments validate the effectiveness and efficiency of the proposed method beyond continuous settings, extending to discrete and multimodal spaces as well as autoregressive settings. Beyond free energy estimation, we establish algebraic identities and reveal a group-theoretic structure linking infinitesimal time reversal and generalized Doob's $h$-transforms, showing that their compositions form a generalized dihedral group.

URL PDF HTML ☆

赞 0 踩 0

2605.31062 2026-06-01 cs.CL

AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering

AdaptR1：基于强化学习的自适应交错思考在多跳问答中的应用

Yuxin Wang, Jiahao Lu, Qifeng Wu, Shicheng Fang, Chuanyuan Tan, Yining Zheng, Xuanjing Huang, Xipeng Qiu

AI总结提出AdaptR1框架，通过强化学习动态分配每步推理预算，减少多跳问答中的过度思考，在保持性能的同时显著降低推理成本。

详情

AI中文摘要

大型语言模型（LLMs）通过思维链（CoT）提示在复杂推理任务中取得了显著性能。然而，这种方法常常导致“过度思考”，即模型为简单查询生成不必要长的推理轨迹，并产生可避免的推理成本。虽然最近的工作探索了自适应推理，但现有方法通常对是否进行推理做出单一的查询级决策。这忽略了多步任务的动态性质，其中显式推理的需求在中间阶段会有所不同。为了解决这一限制，我们引入了AdaptR1，一种基于强化学习（RL）的框架，用于多跳问答（QA）中的自适应交错思考。与需要监督微调（SFT）进行冷启动初始化的先前方法不同，AdaptR1使用完全基于RL的策略，并带有质量门控效率奖励，以动态分配每一步的推理预算。在Graph-R1设置下，AdaptR1将平均思考令牌减少了69.71%，在HotpotQA上减少了90.35%，同时保持与标准基线相当或更好的性能。此外，我们的分析揭示，多跳推理中的过度思考并非均匀分布，而是主要发生在初始规划阶段，这突显了逐步自适应预算分配的有效性。

英文摘要

Large Language Models (LLMs) have achieved remarkable performance in complex reasoning tasks through Chain-of-Thought (CoT) prompting. However, this approach often leads to ``over-thinking,'' where models generate unnecessarily long reasoning traces for simple queries and incur avoidable inference cost. While recent work has explored adaptive reasoning, existing methods typically make a single query-level decision about whether to reason. This overlooks the dynamic nature of multi-step tasks, where the need for explicit reasoning varies across intermediate stages. To address this limitation, we introduce AdaptR1, a Reinforcement Learning (RL) based framework for adaptive interleaved thinking in multi-hop Question Answering (QA). Unlike previous approaches that require Supervised Fine-Tuning (SFT) for cold-start initialization, AdaptR1 uses a fully RL-based strategy with a quality-gated efficiency reward to dynamically allocate reasoning budgets at each step. Under the Graph-R1 setting, AdaptR1 reduces average think tokens by 69.71\%, with a 90.35\% reduction on HotpotQA, while maintaining performance comparable to or better than standard baselines. Furthermore, our analysis reveals that overthinking in multi-hop reasoning is not uniformly distributed but occurs predominantly during the initial planning stages, highlighting the effectiveness of step-wise adaptive budget allocation.

URL PDF HTML ☆

赞 0 踩 0

2605.31061 2026-06-01 cs.LG cs.AI

STEP: Learning STructured Embeddings for Progressive Time Series

STEP：学习渐进时间序列的结构化嵌入

Lucas Thil, Jesse Read, Rim Kaddah, Guillaume Doquet

AI总结提出一种自监督对比学习方法，通过构建具有固定正交原型向量的低维流形几何结构，实现渐进时间序列的端状态预测、多步预测和可解释相位分离。

详情

AI中文摘要

我们提出了一种新颖的方法，用于学习渐进时间序列的可解释表示，即捕获不可逆状态转换（如退化或任务完成）的数据。我们的方法使用自监督对比目标来学习低维潜在空间，其几何结构本身就是解释：每个观测成为位于两个固定正交原型向量之间的流形上的一个点，轨迹成为穿过该流形的路径。从这种结构中，我们读取一个潜在指南针，即潜在向量的极坐标(θ, r)，其中θ跟踪潜在状态的进展（例如，从健康到故障），r识别活动模式（例如，操作条件），无需任何代理标签。我们在不同领域（包括工业退化、机器人任务和神经活动）上评估了该方法与最先进方法的对比，验证了三个关键能力：（1）端状态预测，（2）多步预测，以及（3）可解释的相位分离。我们的方法在所有方面匹配或优于黑盒对应方法，同时提供对底层机制的透明性。在潜在指南针坐标之上的简单线性回归器与深度架构具有竞争力，这是底层状态以几何可访问形式编码的直接定量证据。

英文摘要

We present a novel method for learning interpretable representations of progressive time series, that is, data capturing irreversible state transitions such as degradation or task completion. Our approach uses a self-supervised contrastive objective to learn a low-dimensional latent space whose geometry is itself the interpretation: each observation becomes a point on a manifold anchored between two fixed orthogonal prototype vectors, and a trajectory becomes a path across that manifold. From this structure we read a latent compass, the polar coordinates (θ, r) of the latent vector, in which θ tracks the progression of the underlying state (e.g., from healthy to failed) and r identifies the active mode (e.g., the operating condition), without any proxy labels. We evaluate the approach against the state of the art on diverse domains, including industrial degradation, robotic tasks, and neural activity, validating three key capabilities: (1) end-state prediction, (2) multi-step forecasting, and (3) interpretable phase separation. Our method matches or improves over black-box counterparts on all of these while providing transparency about the underlying mechanisms. A simple linear regressor on top of the latent compass coordinates is competitive with deep architectures, direct quantitative evidence that the underlying state is encoded in a geometrically accessible form.

URL PDF HTML ☆

赞 0 踩 0

2605.31058 2026-06-01 cs.CL cs.SE

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

组合合成：通过原子分解与重组扩展代码RLVR

Jiasheng Zheng, Boxi Cao, Boxi Yu, Yuzhong Zhang, Jialun Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun

AI总结提出原子分解与重组（ADR）框架，通过将代码任务分解为原子元素并受控重组，生成新颖且具有挑战性的可验证代码任务，以解决RLVR训练数据稀缺和扩展性问题，实验表明在多个下游领域显著提升代码能力。

详情

Comments: Work in progress

AI中文摘要

基于可验证奖励的强化学习（RLVR）近期已成为塑造大型语言模型（LLMs）卓越编码能力的基石。然而，RLVR的可扩展性受到严重制约，因为缺乏足够具有挑战性的、针对模型能力边缘的可验证代码任务。先前的研究通常依赖启发式种子扩展进行数据合成，这严重限制了新颖性和难度。因此，此类数据的训练价值无法随合成规模成比例扩展。为此，我们提出原子分解与重组（ADR），一种通过将任务分解为原子元素并进行受控重组来生成可验证代码任务的新框架，从而能够生成真正新颖且具有挑战性的可验证代码任务。实验和分析表明，ADR在原创性、难度、多样性和测试质量方面优于现有基线，并在包括算法编程、工具使用和数据科学在内的多个下游领域的RLVR中持续带来更大的代码能力提升。我们的工作为新颖代码任务合成和可扩展的RLVR训练开辟了新范式。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the remarkable coding abilities of Large Language Models (LLMs). However, the scalability of RLVR is severely constrained by the scarcity of sufficiently challenging verifiable code tasks that target near the model's edge of competence. Prior studies often rely on heuristic seed expansions for data synthesis, which severely limits both novelty and difficulty. Consequently, the training value of such data fails to scale proportionally with the size of its synthesis. To this end, we propose Atomic Decomposition and Recombination (ADR), a novel framework that generates verifiable code tasks via decomposition into atomic elements and controlled recombination, thereby enabling the generation of genuinely novel and challenging verifiable code tasks. Experiments and analysis demonstrate that ADR achieves superior originality, difficulty, diversity, and test quality over existing baselines, and consistently delivers greater improvements in code ability across RLVR in diverse downstream domains, including algorithmic programming, tool usage, and data science. Our work sheds light on a new paradigm for novel code task synthesis and scalable RLVR training.

URL PDF HTML ☆

赞 0 踩 0

2605.31057 2026-06-01 cs.CV cs.LG

LVSA: Training-Free Sparse Attention for Long Video Diffusion

LVSA：长视频扩散的无训练稀疏注意力

Gael Glorian, Ioannis Lamprou, Zhen Zhang, Yujie Yuan, Hongsheng Liu

AI总结提出一种无需训练、模型无关的块稀疏注意力方法LVSA，通过结构化窗口模式与旋转全局锚点结合，在降低长视频扩散推理计算成本的同时消除固定网格偏差，支持超训练时域的视频生成。

详情

Comments: 10 pages, 5 figures, 4 tables. Code: https://github.com/JiusiServe/LongVideoSparseAttention

AI中文摘要

密集自注意力是长视频扩散推理的计算和质量的瓶颈：成本随序列长度二次增长，且超出训练时域时模型收敛到近乎静态的输出，即“冻结”的重复视频。最先进的方法要么成本过高（例如需要重新训练），要么无法以可扩展的方式同时满足性能和质量目标。为此，我们提出长视频稀疏注意力（LVSA），一种无需训练、模型无关的块稀疏注意力方法，用于视频扩散Transformer，它结合了结构化窗口模式与旋转全局锚点，从而消除了导致长时域伪影的固定网格偏差。LVSA结合FlashInfer内核，与密集注意力相比，在Wan 2.1 1.3B上以6倍时域减少计算量达3.17倍，在Wan 2.1 14B上以6倍时域减少2.98倍，在HunyuanVideo 1.5上以1.5倍时域减少3.33倍。除了减少计算量，LVSA还使得HunyuanVideo 1.5能够在2倍时域下生成，否则在单个GPU上会内存不足。此外，与RIFLEx相比，LVSA在Wan 2.1 1.3B上提供高达2.41倍的加速，与UltraViCo相比提供3.27倍的加速。为了展示跨不同平台的适用性，我们将LVSA应用于NPU，与密集注意力相比，在Wan 2.2 A14B上实现高达2.71倍的加速，在Wan 2.1 1.3B上实现3.24倍的加速。为了公平地评估质量，我们引入了VQeval，一个正确评分循环视频失败的工具，而VBench-Long等最先进评估器则会奖励这类失败。LVSA在训练时域长度下生成时质量中性，在扩展长度下质量积极。

英文摘要

Dense self-attention is the compute and quality bottleneck of long-video diffusion inference: cost grows quadratically with the sequence length, and beyond the training horizon the model converges to near-static output, that is, "frozen" repetitive video. State of the art approaches are either too costly, e.g., they require retraining, or fail to satisfy both performance and quality objectives in a scalable manner. To this end, we introduce Long Video Sparse Attention (LVSA), a training-free model-agnostic block-sparse attention for video diffusion transformers that combines a structured window pattern with rotating global anchors, thus removing the fixed-grid bias which causes long-range temporal artifacts. LVSA, combined with a FlashInfer kernel, reduces compute up to 3.17x on Wan 2.1 1.3B at a 6x horizon, 2.98x on Wan 2.1 14B at a 6x horizon, and 3.33x on HunyuanVideo 1.5 at a 1.5x horizon, compared to dense attention. Beyond reducing compute, LVSA enables HunyuanVideo 1.5 generation at a 2x horizon, which is otherwise out-of-memory on a single GPU. Moreover, LVSA provides speedups up to 2.41x compared to RIFLEx and 3.27x compared to UltraViCo on Wan 2.1 1.3B. To demonstrate applicability across diverse platforms, we apply LVSA on NPUs and achieve speedups up to 2.71x on Wan 2.2 A14B and 3.24x on Wan 2.1 1.3B compared to dense attention. To evaluate quality in a fair way, we introduce VQeval, a tool properly scoring loopy video failures, which instead are rewarded in state of the art evaluators like VBench-Long. LVSA is quality-neutral for generation at training horizon length and quality-positive at extended lengths.

URL PDF HTML ☆

赞 0 踩 0

2605.31056 2026-06-01 cs.CL

How Much Do LLMs Know About Chinese Zero Pronouns?

LLMs 对中文零代词的了解程度如何？

Yifei Li, Guanyi Chen, Tingting He

AI总结通过一系列语言学动机任务（识别、指称性分类、指称类型分类、消解和翻译），系统评估了大型语言模型处理中文零代词的能力，发现当前LLMs在零代词处理上仍面临巨大挑战，尤其在识别和指称性分类等上游任务上表现不佳。

2605.31053 2026-06-01 cs.SD cs.AI

AnchorSteer: Self-Discovered Concept Injection for Structure-Preserving Music Editing

AnchorSteer: 自发现概念注入用于结构保持的音乐编辑

Chih-Heng Chang, Keng-Seng Ho, Chih-Yu Tsai, Kuan-Lin Chen, Yi-Hsuan Yang, Jian-Jiun Ding

AI总结提出AnchorSteer框架，通过结构锚定与自发现语义注入解耦语义-结构纠缠，实现高保真结构保持下的显著语义变换。

详情

Comments: Accepted by the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

AI中文摘要

可控音乐编辑旨在修改高级属性，同时严格保留节奏和旋律结构。然而，这一任务面临语义-结构纠缠的挑战：引导方法往往为了编辑性能而牺牲结构，而结构适配器则抑制语义响应。我们提出AnchorSteer，一个通过将结构锚定与自发现语义引导耦合来解耦这种张力的框架。该方法通过自监督重构目标探测内部表示，提取可解释、无标签的概念向量，无需精心策划的数据即可隔离属性。在编辑过程中，这些便携、即插即用的概念向量被注入扩散隐空间，同时结构适配器强制执行一致性。提供了无条件和条件注入的变体，以平衡鲁棒性和语义强度。在ZoME-Bench和主观测试上的实验表明，所提出的框架优于纯引导和纯锚定的基线，实现了高保真结构保持下的显著语义变换。

英文摘要

Controllable music editing is to modify high-level attributes while strictly preserving rhythmic and melodic structures. However, this task is challenged by a semantic-structural entanglement: steering methods often degrade structure to achieve editing performance, while structural adaptors suppress semantic responsiveness. We propose AnchorSteer, a framework that disentangles this tension by coupling structural anchoring with self-discovered semantic steering. The proposed approach probes internal representations to extract interpretable, label-free concept vectors via a self-supervised reconstruction objective, isolating attributes without curated data. During editing, these portable, plug-and-play concept vectors are injected into diffusion hidden manifolds while a structural adaptor enforces consistency. Variants for unconditioned and conditioned injections are provided to balance robustness and semantic strength. Experiments on ZoME-Bench and subjective tests show that the proposed framework outperforms both steering-only and anchoring-only baselines, enabling significant semantic transformations with high-fidelity structural preservation.

URL PDF HTML ☆

赞 0 踩 0

2605.31050 2026-06-01 cs.LG

Best-Arm Identification-Based Trust Region Selection for Bayesian Optimization on Multimodal Functions

基于最佳臂识别的多模态函数贝叶斯优化信任区域选择

Nobuo Namura, Sho Takemori

AI总结提出一种结合最佳臂识别与信任区域贝叶斯优化的轨迹感知框架，通过预测局部优化器最终性能并逐步淘汰次优候选，加速多模态函数全局优化。

详情

Comments: 19 pages, 13 figures

AI中文摘要

基于高斯过程的贝叶斯优化是昂贵的黑箱优化的流行方法，但其性能在复杂多模态或高维问题上常常下降。基于信任区域的贝叶斯优化通过聚焦局部区域缓解了这一问题，最近的研究表明，选择有效区域可以建模为多臂老虎机问题。我们提出了一种轨迹感知框架，将最佳臂识别与基于信任区域的贝叶斯优化相结合，以高效求解多模态优化问题。我们的方法外推多个局部初始化优化器的优化轨迹以预测其最终性能，并通过最佳臂识别逐步淘汰次优候选。我们从理论上证明，在温和假设下，所提出的最佳臂识别引导的贝叶斯优化比传统贝叶斯优化更快收敛到全局最优，并通过在合成和真实世界基准上的大量实验证明了其有效性。

英文摘要

Gaussian process-based Bayesian optimization (BO) is a popular approach for expensive black-box optimization, but its performance often degrades on complex multimodal or high-dimensional problems. Trust region-based BO mitigates this issue by focusing on local regions, and recent studies suggest that selecting an effective region can be formulated as a multi-armed bandit problem. We propose a trajectory-aware framework that integrates best-arm identification (BAI) with trust region-based BO to efficiently solve multimodal optimization problems. Our method extrapolates the optimization trajectories of multiple locally initialized optimizers to predict their final performance and progressively eliminates suboptimal candidates via BAI. We theoretically show that the proposed BAI-guided BO converges faster to the global optimum than conventional BO under mild assumptions, and demonstrate its effectiveness through extensive experiments on synthetic and real-world benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.31049 2026-06-01 cs.LG cs.AI cs.LO

Learning to Solve and Optimize by Evolving Code

通过代码演化学习求解与优化

Veronika Semmelrock, Benedetta Strizzolo, Francesco Zuccato, Gerhard Friedrich, Patrick Rodler, Konstantin Schekotihin

AI总结提出CHECKMATE工具，利用形式规范确保解的正确性并通过自然语言描述指导代码演化，自动生成算法，在配置与调度问题上超越最先进求解器。

详情

Comments: Preprint of a paper accepted to IJCAI26

AI中文摘要

组合与优化问题是许多工业AI应用的基础。解决此类大规模现实世界实例通常需要仔细的问题形式化、专门的求解器以及专家设计的启发式方法。因此，专家不仅需要指定解是什么，还需要指定如何推导出解。通过引入工具CHECKMATE，我们展示了通过代码演化生成算法代表了一种范式转变，消除了制定如何的需求。CHECKMATE仅依赖于是什么。具体来说，形式规范确保了解的正确性，并能够对生成的程序进行系统性能评估，而自然语言描述则指导演化过程。我们的方法在两个工业领域（配置与调度）的选定问题上展示了有效性。在所有案例中，演化出的算法始终优于最先进的求解器。这凸显了形式方法在引导代码演化以自动解决复杂现实问题方面的潜力。

英文摘要

Combinatorial and optimization problems are fundamental to many industrial AI applications. Solving large-scale real-world instances of such problems typically requires careful problem formalization, specialized solvers, and expert-designed heuristics. Thus, experts need to specify not only what solutions are, but also how they are derived. By introducing the tool CHECKMATE, we show that algorithm generation via code evolution represents a paradigm shift by eliminating the need to formulate the how. CHECKMATE solely relies on the what. Specifically, a formal specification ensures solutions' correctness and enables systematic performance evaluation of the generated programs, while a natural language description guides the evolutionary process. The effectiveness of our method is demonstrated on selected problems from two industrial domains: configuration and scheduling. In all cases, the evolved algorithms consistently outperform state-of-the-art solvers. This underscores the potential of formal methods in guiding code evolution for automatically solving complex real-world problems.

URL PDF HTML ☆

赞 0 踩 0

2605.31048 2026-06-01 cs.CV

Rethinking Efficient Crack Segmentation with Task-Aligned Structural-Directional Modeling

重新思考基于任务对齐的结构-方向性建模的高效裂缝分割

Shipeng Liu, Liang Zhao, Dengfeng Chen, Weihua Zhang

AI总结将裂缝分割视为稀疏结构恢复问题，提出RIFT模型，通过轻量多尺度融合保留局部证据、聚合方向连续性，在16项指标上达到最优或并列最优。

详情

AI中文摘要

最近的裂缝分割方法通常遵循通用的语义分割设计，使用更强的骨干网络、混合CNN-Transformer-Mamba编码器和辅助增强分支。虽然有效，但这引发了疑问：更强的通用特征混合是否是裂缝分割最合适的方向。相反，我们将裂缝分割表述为稀疏结构恢复。裂缝具有有限的类别级语义，但具有很强的形态规律性，即细、稀疏、各向异性、局部碎片化，且容易与纹理或阴影混淆。因此，关键瓶颈在于保留弱结构证据、恢复方向连续性以及抑制背景耦合。我们提出RIFT，一个紧凑的形态对齐裂缝分割模型家族。RIFT设计简单，而不是压缩复杂的通用架构，它保留局部证据，聚合协作方向连续性，并通过轻量多尺度融合恢复裂缝结构。在四个公共基准上的实验表明，RIFT在16个主要指标上对再现的代表性基线取得了最佳或并列最佳结果。RIFT-B提供了最强的整体精度，而RIFT-T提供了最佳的部署效率，仅0.47M参数和高推理速度。拓扑感知评估、消融实验、迁移实验和可视化进一步验证了，当其归纳偏置与裂缝形态匹配时，任务对齐的简单性可以匹配或超越复杂的混合架构。代码：https://github.com/xauat-liushipeng/RIFT

英文摘要

Recent crack segmentation methods often follow generic semantic segmentation designs, using stronger backbones, hybrid CNN-Transformer-Mamba encoders, and auxiliary enhancement branches. Although effective, this raises whether stronger generic feature mixing is the most suitable direction for crack segmentation. We instead formulate crack segmentation as sparse structural recovery. Cracks have limited category-level semantics but strong morphological regularities, being thin, sparse, anisotropic, locally fragmented, and easily confused with textures or shadows. Thus, the key bottleneck lies in preserving weak structural evidence, recovering directional continuity, and suppressing background coupling. We propose RIFT, a compact family of morphology-aligned crack segmentation models. Rather than compressing a complex generic architecture, RIFT is simple by design, preserving local evidence, aggregating cooperative directional continuity, and restoring crack structures through lightweight multi-scale fusion. Experiments on four public benchmarks show that RIFT achieves the best or tied-best results across the 16 main metrics against reproduced representative baselines. RIFT-B gives the strongest overall accuracy, while RIFT-T provides the best deployment efficiency with only 0.47M parameters and high inference speed. Topology-aware evaluation, ablations, transfer experiments, and visualizations further verify that task-aligned simplicity can match or surpass complex hybrid architectures when its inductive bias fits crack morphology. Code: https://github.com/xauat-liushipeng/RIFT

URL PDF HTML ☆

赞 0 踩 0

2605.31044 2026-06-01 cs.LG

The Challenges of Using Reinforcement Learning for Controlling Industrial Energy Systems

使用强化学习控制工业能源系统的挑战

Tobias Lademann, Théo Vincent, Jan Peters, Matthias Weigold

AI总结本文以热力供暖网络为例，研究强化学习在真实工业能源系统部署中的挑战，包括部分可观测性、动作空间设计、奖励设计及仿真到现实的差距，并基于实际部署发现强化学习虽能实现运行稳定性但存在性能差距。

2605.31043 2026-06-01 stat.ML cs.AI cs.LG

Routing on the Stiefel Manifold: When Does Adaptive Subspace Selection Help for Cross-Domain EEG Decoding?

Stiefel流形上的路由：自适应子空间选择何时有助于跨域脑电解码？

Isabella Costa Maia, Pedro L. C. Rodrigues, Salem Said, Marco Congedo

AI总结针对跨域脑电解码中协方差矩阵域偏移问题，提出动态Stiefel路由方法，通过Stiefel流形上的专家投影滤波器池和交叉注意力机制实现自适应子空间选择，并引入三种结构性质避免退化为集成平均，在三个数据集上取得一致提升。

详情

AI中文摘要

尽管黎曼深度学习取得了进展，跨域脑电解码仍然具有挑战性：来自不同受试者的协方差矩阵占据了SPD流形上系统不同的区域，然而现有的域适应方法要么需要目标域校准数据，要么学习无法跨域泛化的受试者特定组件。我们提出了动态Stiefel路由：在Stiefel流形上有一个包含$K$个专家投影滤波器的池，每个滤波器专门处理SPD流形上的不同区域，每个输入协方差通过交叉注意力路由到最合适的滤波器，从而为每个样本自适应调整子空间投影。一个核心发现是，这种朴素实现的方法会退化为集成平均：当路由权重均匀时，自适应滤波器恰好等价于专家的等贡献组合，与单个固定滤波器无法区分。三种结构性质打破了这种退化：一个对称锚点$W_{\mathrm{base}} \in \mathrm{St}(n,k)$消除了专家间的邻近偏差；一个冻结的域判别查询编码器将路由与任务优化解耦；以及一个解耦的键对齐损失，将专家键训练到稳定的域吸引子。它们共同产生了SPD流形上第一个真正承诺且域结构化的路由，在三个数据集上取得一致提升：平衡准确率分别从$0.773\to 0.823$、$0.757\to 0.809$和$0.801\to 0.839$，且对齐策略由单一数据驱动规则自动确定，无需数据集特定的超参数搜索。

英文摘要

Cross-domain EEG decoding remains challenging despite advances in Riemannian deep learning: covariance matrices from different subjects occupy systematically distinct regions of the SPD manifold, yet existing domain adaptation methods either require target-domain calibration data or learn subject-specific components that cannot generalise across domains. We propose dynamic Stiefel routing: a pool of $K$ expert projection filters on the Stiefel manifold, each specialised for a different region of the SPD manifold, with each input covariance routed to the most appropriate filter via cross-attention, adapting the subspace projection per sample. A central finding is that this approach, implemented naively, provably collapses to ensemble averaging: when routing weights are uniform, the adaptive filter reduces exactly to an equal-contribution combination of experts, indistinguishable from a single fixed filter. Three structural properties break this degeneracy: a symmetric anchor $W_{\mathrm{base}} \in \mathrm{St}(n,k)$ that removes proximity bias among experts; a frozen domain-discriminative query encoder that decouples routing from task optimisation; and a decoupled key alignment loss that trains expert keys toward stable domain attractors. Together they produce the first genuinely committed and domain-structured routing on SPD manifolds, with consistent gains across three datasets: balanced accuracy improves from $0.773\to 0.823$, $0.757\to 0.809$, and $0.801\to 0.839$, with the alignment strategy determined automatically by a single data-driven rule and no dataset-specific hyperparameter search.

URL PDF HTML ☆

赞 0 踩 0

2605.31042 2026-06-01 cs.CR cs.AI cs.CL

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

从提示注入到持久控制：防御智能体框架中的木马后门

Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu, Yiruo Cheng, Xiaoxi Li, Ji-Rong Wen

AI总结本文提出ClawTrojan基准测试揭示本地智能体框架中的多步木马攻击，并设计DASGuard防御方法，通过扫描控制文本、追溯来源并清除不可信控制内容，实现动态防御。

详情

Comments: Code and data are available at https://github.com/RUC-NLPIR/ClawTrojan

AI中文摘要

LLM智能体正在从对话式聊天机器人演变为实际工作空间中的操作工具。在本地智能体框架中，LLM可以读写文件、调用工具，并在会话间重用工作空间状态。虽然这些功能增强了实用性，但也为攻击者暴露了新的攻击面。攻击者可以将提示注入嵌入文件或工具输出中。智能体可能会读取这一隐藏指令，存储它，并在之后执行。在这种多步木马攻击范式中，没有任何单个步骤本身是恶意的，但这些步骤可以共同将不可信文本转化为持久控制内容。然而，现有防御通常孤立地检查每个步骤。因此，它们可以阻止明显的恶意行为，但无法检测到植入后门的早期写操作。为了揭示这一威胁，我们引入了ClawTrojan，一个旨在识别本地智能体框架中多步木马攻击的基准测试。在OpenClaw风格的模拟工作空间中，使用GPT-5.4，ClawTrojan达到了95.5%的攻击成功率（ASR），而同一模型上现有的单轮提示注入攻击产生的ASR接近零。为了解决这一威胁，我们提出了DASGuard，它扫描敏感本地文件中的控制类文本，追溯其来源，并清除非可信来源的控制内容。我们的结果表明，DASGuard通过结合运行时攻击阻断和对工作空间的清理提交，实现了强大的动态防御。

英文摘要

LLM agents are evolving from conversational chatbots to operational tools in real-world workspaces. In local agentic harnesses, an LLM can read and write files, call tools, and reuse workspace state across sessions. While such capabilities enhance utility, they also expose a new attack surface for attackers. Attackers can embed a prompt injection within a file or tool output. Agents may read this hidden instruction, store it, and execute it later. In this multi-step trojan attack paradigm, no individual step appears malicious on its own, but these steps can collectively turn untrusted text into persistent control content. However, existing defenses often inspect each step in isolation. As a result, they can block a clear harmful action, but fail to detect the earlier write operation that plants the backdoor. To reveal this threat, we introduce ClawTrojan, a benchmark designed to identify multi-step trojan attacks in local agentic harnesses. In an OpenClaw-style simulated workspace with GPT-5.4, ClawTrojan reaches a 95.5% attack success rate (ASR), while existing single-turn prompt-injection attacks produce near-zero ASR on the same model. To address this threat, we propose DASGuard, which scans control-like text in sensitive local files, traces its origin, and removes control content that does not originate from a trusted source. Our results show that DASGuard achieves strong dynamic defense by combining runtime attack blocking with sanitized commits to the workspace.

URL PDF HTML ☆

赞 0 踩 0

2605.31041 2026-06-01 cs.CV cs.AI

Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior?

视觉信息在视觉-语言-动作模型驾驶行为中是否起决定性作用？

Jingtao He, Hongliang Lu, Xiaoyun Qiu, Yixuan Wang, Xinhu Zheng

AI总结本文提出结构化多级视觉扰动框架，系统分析VLA驾驶模型对视觉信息的依赖程度，揭示依赖模式随评估方式变化且在不同抽象层次上不均匀。

详情

AI中文摘要

视觉-语言-动作（VLA）模型在自动驾驶中展现出令人期待的能力，凸显了统一多模态架构联合建模感知与规划的潜力。然而，当前基于VLA的驾驶行为如何植根于视觉信息仍知之甚少。现有评估协议主要关注聚合性能指标，缺乏结构化和实用的诊断方法来量化视觉-行为依赖性。在这项工作中，我们引入了一个结构化的多级视觉扰动框架，以系统分析基于VLA的驾驶模型中的视觉-行为依赖性。该框架沿着三个互补维度组织受控视觉扰动：通道级退化、信息级破坏和结构级修改。我们将其应用于基于VLA的驾驶系统，并在开环轨迹预测和交互式闭环安全评估下评估行为响应。实验揭示了依赖于评估的依赖模式以及跨抽象层次的不均匀视觉基础。这些发现呼吁对VLA驾驶模型进行更结构化的分析和原则性设计，以更好地理解视觉信息如何塑造行为，并开发更安全、更鲁棒的系统。

英文摘要

Vision-Language-Action (VLA) models have demonstrated promising capability in autonomous driving, highlighting the potential of unified multimodal architectures for jointly modeling perception and planning. However, how current VLA-based driving behavior is grounded in visual information remains poorly understood. Existing evaluation protocols mainly focus on aggregate performance metrics, lacking structured and practical diagnostics to quantify visual-behavior dependency. In this work, we introduce a structured multi-level visual perturbation framework to analyze visual-behavior dependency in VLA-based driving models systematically. The framework organizes controlled visual perturbations along three complementary dimensions: channellevel degradation, information-level disruption, and structurelevel modification. We apply it to VLA-based driving systems and evaluate behavioral responses under both open-loop trajectory prediction and interactive closed-loop safety evaluation. Experimental results reveal evaluation-dependent dependency patterns and uneven visual grounding across abstraction levels. These findings call for more structured analyses and principled design of VLA driving models to better understand how visual information shapes behavior and develop safer, more robust systems.

URL PDF HTML ☆

赞 0 踩 0

2605.31040 2026-06-01 cs.LG

UniRTL: Unifying Code and Graph for Robust RTL Representation Learning

UniRTL：统一代码和图以实现稳健的RTL表示学习

Yi Liu, Hongji Zhang, Lei Chen, Mingxuan Yuan, Qiang Xu

AI总结提出UniRTL多模态预训练框架，通过互掩码建模和分层训练策略联合利用RTL代码与控制数据流图，实现细粒度对齐，在性能预测和代码检索任务上优于现有方法。

详情

Comments: Forty-Third International Conference on Machine Learning (ICML 2026)

AI中文摘要

为寄存器传输级（RTL）设计开发有效的表示对于加速硬件设计工作流至关重要。然而，现有方法通常依赖于单一数据模态，即RTL代码或其相关的基于图的表示，限制了所学表示的表达能力和泛化能力。对于RTL，控制数据流图（CDFG）提供了保留完整信息的全面结构表示，而代码模态显式编码了语义和功能信息。我们认为，整合这些互补模态对于全面理解RTL设计至关重要。为此，我们提出UniRTL，一种多模态预训练框架，通过联合利用代码和CDFG学习统一的RTL表示。UniRTL通过互掩码建模实现代码和图之间的细粒度对齐，并采用分层训练策略，该策略结合了预训练的图感知分词器以及在图集成之前对文本（即功能摘要）和代码进行分阶段对齐。我们在两种下游任务（性能预测和代码检索）的多种设置下评估UniRTL。实验结果表明，UniRTL始终优于先前的方法，使其成为推进硬件设计自动化的更稳健和更强大的基础。

英文摘要

Developing effective representations for register transfer level (RTL) designs is crucial for accelerating the hardware design workflow. Existing approaches, however, typically rely on a single data modality, either the RTL code or its associated graph-based representation, limiting the expressiveness and generalization ability of the learned representations. For RTL, the control data flow graph (CDFG) offers a comprehensive structural representation that preserves complete information, while the code modality explicitly encodes semantic and functional information. We argue that integrating these complementary modalities is essential for a thorough understanding of RTL designs. To this end, we propose UniRTL, a multimodal pretraining framework that learns unified RTL representations by jointly leveraging code and CDFG. UniRTL achieves fine-grained alignment between code and graph through mutual masked modeling and employs a hierarchical training strategy that incorporates a pretrained graph-aware tokenizer and staged alignment of text (i.e., functional summary) and code prior to graph integration. We evaluate UniRTL on two downstream tasks, performance prediction and code retrieval, under multiple settings. Experimental results show that UniRTL consistently outperforms prior methods, establishing it as a more robust and powerful foundation for advancing hardware design automation.

URL PDF HTML ☆

赞 0 踩 0

2605.31036 2026-06-01 cs.GT cs.LG

Model Monotonicity in Autobidding Auctions: When Do Better Predictions Lead to Better Outcomes?

自动竞价拍卖中的模型单调性：更好的预测何时带来更好的结果？

Ashwinkumar Badanidiyuru

AI总结研究在线广告中推荐系统模型质量、拍卖格式和自动竞价者行为的相互作用，通过聚类精炼定义模型改进，并系统刻画不同竞价者类型、拍卖格式和预算约束下评估指标单调性的条件。

详情

Journal ref: ICML 2026

AI中文摘要

在线广告平台依赖机器学习模型预测点击率（pCTR）和转化率（pCVR）以用于拍卖机制。我们引入了一个新框架来研究推荐系统模型质量、拍卖格式和自动竞价者行为之间的相互作用。我们形式化了模型改进——通过受概率论中滤子启发的精炼关系定义——何时导致平台级评估指标（如收入、福利或流动性福利）的改进。我们的主要贡献是：（1）基于聚类精炼的模型改进的形式化定义，以及（2）跨不同竞价者类型（tCPA、max-CPA）、拍卖格式（第一价格、第二价格、VCG）和预算约束的ECM单调性的系统刻画。我们证明，具有统一竞价的第一价格拍卖保证了无预算的tCPA竞价者的收入单调性（通过Jensen不等式），而第二价格拍卖和预算约束可能破坏这一性质。我们为非单调性结果提供了完整的数值构造。我们的发现对寻求将模型改进与业务成果对齐的广告平台具有实际意义。

英文摘要

Online advertising platforms rely on machine learning models to predict click-through rates (pCTR) and conversion rates (pCVR) for auction mechanisms. We introduce a novel framework to study the interaction between recommender system model quality, auction format, and autobidder behavior. We formalize when model improvements -- defined via a refinement relation inspired by filtrations in probability theory -- lead to improvements in platform-level Evaluation Criteria Metrics (ECM) such as revenue, welfare, or liquid welfare. Our main contributions are: (1) a formal definition of model improvement based on cluster refinement, and (2) a systematic characterization of ECM monotonicity across different combinations of bidder types (tCPA, max-CPA), auction formats (first-price, second-price, VCG), and budget constraints. We show that first-price auctions with uniform bidding guarantee revenue monotonicity for tCPA bidders without budgets (via Jensen's inequality), while second-price auctions and budget constraints can break this property. We provide full numerical constructions for the non-monotonicity results. Our findings have practical implications for advertising platforms seeking to align model improvements with business outcomes.

URL PDF HTML ☆

赞 0 踩 0

2605.31034 2026-06-01 cs.LG cs.AI

Annealed Softmax Greedy in Many-Armed Bayesian Bandits

多臂贝叶斯老虎机中的退火Softmax贪婪算法

William Overman, Mohsen Bayati

AI总结本文研究退火Softmax贪婪算法在多臂贝叶斯伯努利老虎机中的贝叶斯遗憾，证明在先验满足线性上尾条件（β=1的β正则性）时，算法达到接近最优的贝叶斯遗憾率，并与RLVR方法形成结构类比。

详情

AI中文摘要

具有可验证奖励的强化学习（RLVR）和基于组的策略优化方法（如GRPO）通过为每个提示采样多个完成并增加策略在奖励较高的完成上的概率来更新随机策略，同时通过KL惩罚向参考策略正则化。这些更新不包括追踪认知不确定性的显式机制。本文研究为何这种不确定性无关的更新仍然有效的一个风格化解释。我们分析了一个退火softmax（玻尔兹曼）策略，该策略在多臂贝叶斯伯努利老虎机中根据经验平均奖励的softmax选择动作。在先验满足线性上尾条件（β正则性的β=1情况）下，该条件意味着存在大量接近最优的臂，我们证明退火softmax贪婪算法实现了贝叶斯遗憾$ ilde{O}(m + T/m)$，特别地，当臂数$m = Θ(\sqrt{T})$时，遗憾为$ ilde{O}(\sqrt{T})$。这是该机制下接近最优的贝叶斯遗憾率，经验平均贪婪算法也能达到。在β正则性下，许多臂在整个学习过程中保持经验均值接近最优，因此当softmax采样一个非经验最优的臂时，该臂往往是另一个接近最优的臂，而不是明显较差的臂。相比之下，当臂数较少时，同类的softmax策略可能遭受线性遗憾。该结果也为RLVR提供了结构类比，其中以非可忽略概率产生正确完成的基础策略扮演了β正则性的角色。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) and group-based policy optimization methods such as GRPO update a stochastic policy by sampling multiple completions per prompt and increasing the policy's probability on those with higher reward, regularized by a KL penalty toward a reference policy. These updates do not include explicit mechanisms that track epistemic uncertainty. This paper studies a stylized explanation for why such uncertainty-agnostic updates can nevertheless be effective. We analyze an annealed softmax (Boltzmann) policy that selects actions according to a softmax of empirical mean rewards in a many-armed Bayesian Bernoulli bandit. Under a linear upper-tail condition on the prior (the $β=1$ case of $β$-regularity), which implies an abundance of near-optimal arms, we prove that annealed softmax greedy achieves Bayes regret $\tilde{O}(m + T/m)$, and in particular $\tilde{O}(\sqrt{T})$ when the number of arms scales as $m = Θ(\sqrt{T})$. This is the near-optimal Bayes regret rate in this regime, attained also by empirical-mean greedy. Under $β$-regularity, many arms maintain empirical means close to the optimum throughout learning, so when softmax samples an arm other than the empirically best, that arm tends to be another near-optimal one rather than a clearly inferior one. By contrast, with a small number of arms, the same kind of softmax policy can suffer linear regret. The result also provides a structural analogy to RLVR, where a base policy with a non-negligible probability of producing a correct completion plays the role of $β$-regularity.

URL PDF HTML ☆

赞 0 踩 0

2605.31033 2026-06-01 cs.CV

SlotMemory: Object-Centric KV Memory for Streaming Long-Video Generation

SlotMemory: 面向流式长视频生成的以对象为中心的KV记忆

Weijia Dou, Hui Li, Jiahao Cui, Lei Zhou, Jingdong Wang, Siyu Zhu

AI总结提出SlotMemory，一种以对象为中心的键值记忆机制，通过将变换器的键值流形分解为离散语义槽，实现实体级持久性和提示感知检索，在60秒交互叙事中动态一致性相对提升22.8%。

详情

AI中文摘要

流式视频生成模型通常依赖于以时间为中心的记忆，将历史上下文组织为原始帧、片段或未聚类的令牌。这种组织方式常导致实体离开画面或交互式提示转换时出现身份漂移和语义不一致。为解决这些限制，我们提出SlotMemory，一种用于流式视频扩散的以对象为中心的键值记忆机制。我们的方法通过将变换器的键值流形分解为离散、可重用的语义槽，将记忆抽象从事件发生的“何时”转移到所表示的“什么”。通过利用这些槽作为路由地址来索引和存储高保真键值令牌，我们实现了跨长时域的实体级持久性和提示感知检索。在使用Wan2.1-T2V-1.3B骨干网络对60秒交互叙事进行评估时，SlotMemory达到了81.61的最先进质量分数，并在动态一致性上比现有最强流式基线相对提升22.8%。我们的结果表明，结构化的语义表示，而非原始时间容量，是持久长视频合成的关键原语。我们的代码和检查点可在https://tj12323.github.io/SlotMemory/获取。

英文摘要

Streaming video generation models typically rely on temporal-centric memory, which organizes historical context as raw frames, chunk segments, or unclustered tokens. This organization frequently leads to identity drift and semantic inconsistency when entities exit the frame or during interactive prompt transitions. To address these limitations, we propose SlotMemory, an object-centric Key-Value memory mechanism for streaming video diffusion. Our approach shifts the memory abstraction from "when" an event occurred to "what" is being represented by decomposing the transformer's key-value manifold into discrete, reusable semantic slots. By utilizing these slots as routing addresses to index and store high-fidelity key-value tokens, we enable entity-level persistence and prompt-aware retrieval across long horizons. Evaluated on 60-second interactive narratives using the Wan2.1-T2V-1.3B backbone, SlotMemory achieves a state-of-the-art quality score of 81.61 and a 22.8 percent relative improvement in dynamic consistency over the strongest existing streaming baseline. Our results demonstrate that structured semantic representation, rather than raw temporal capacity, is the essential primitive for persistent long-form video synthesis. Our codes and checkpoints are available at https://tj12323.github.io/SlotMemory/.

URL PDF HTML ☆

赞 0 踩 0

2605.31031 2026-06-01 cs.AI

GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning

GraphARC：基于图的抽象推理综合基准

Saku Peltonen, August Bøgh Rønberg, Andreas Plesner, Roger Wattenhofer

AI总结提出GraphARC基准，将抽象推理扩展到图结构数据，通过少样本变换学习任务评估模型在局部、全局和层次图变换上的泛化能力，并揭示语言模型的理解-执行差距和规模扩展障碍。

详情

DOI: 10.1145/3770855.3817591
Comments: Accepted at KDD 2026 Datasets and Benchmarks Track

AI中文摘要

关系推理是智能的核心，但现有基准通常局限于网格或文本格式。我们引入了GraphARC，一个用于图结构数据抽象推理的基准。GraphARC推广了抽象与推理语料库（ARC）的少样本变换学习范式。每个任务需要从几个输入-输出对中推断变换规则，并将其应用于新的测试图，涵盖局部、全局和层次图变换。与基于网格的ARC不同，GraphARC实例可以在不同的图族和规模上大规模生成，从而能够系统评估泛化能力。我们在GraphARC上评估了最先进的语言模型，并观察到明显的局限性。模型能够回答关于图属性的问题，但往往无法解决完整的图变换任务，揭示了理解-执行差距。在更大实例上性能进一步下降，暴露了规模扩展障碍。更广泛地说，通过将节点分类、链接预测和图生成的方面结合在一个单一框架内，GraphARC为未来的图基础模型提供了一个有前景的测试平台。

英文摘要

Relational reasoning lies at the heart of intelligence, but existing benchmarks are typically confined to formats such as grids or text. We introduce GraphARC, a benchmark for abstract reasoning on graph-structured data. GraphARC generalizes the few-shot transformation learning paradigm of the Abstraction and Reasoning Corpus (ARC). Each task requires inferring a transformation rule from a few input-output pairs and applying it to a new test graph, covering local, global, and hierarchical graph transformations. Unlike grid-based ARC, GraphARC instances can be generated at scale across diverse graph families and sizes, enabling systematic evaluation of generalization abilities. We evaluate state-of-the-art language models on GraphARC and observe clear limitations. Models can answer questions about graph properties but often fail to solve the full graph transformation task, revealing a comprehension-execution gap. Performance further degrades on larger instances, exposing scaling barriers. More broadly, by combining aspects of node classification, link prediction, and graph generation within a single framework, GraphARC provides a promising testbed for future graph foundation models.

URL PDF HTML ☆

赞 0 踩 0

2605.31029 2026-06-01 cs.CV

PEEK: Picking Essential frames via Efficient Knowledge distillation

PEEK: 通过高效知识蒸馏提取关键帧

Killian Steunou, Anas Filali Razzouki, Khalil Guetari, Mounîm A. El-Yacoubi, Yannis Tevissen

AI总结提出PEEK方法，通过知识蒸馏将教师模型的帧相关性排名迁移至轻量级时序模型，实现高效动态帧采样，在低帧预算下显著提升视频字幕生成性能。

详情

Comments: Supplementary material at https://www.killian-steunou.com/peek/static/pdfs/peek_supplementary.pdf

AI中文摘要

视频语言模型只能处理有限数量的帧，使得帧选择成为高效视频字幕生成的关键瓶颈。大多数字幕生成流程仍依赖均匀采样，该方法计算成本低但忽略视觉内容。自适应帧采样最近成为从视频中选择最具信息量帧的有前景方法，但现有方法计算成本仍然高昂。我们提出PEEK，一种高效的动态帧采样方法，它将字幕条件帧相关性排名从更强的教师模型蒸馏到仅基于视觉内容运行的轻量级时序模型中。我们发现，总体而言，在ActivityNet Captions和MSR-VTT上，我们的方法在所有评估的下游视觉语言模型中优于最先进方法，特别是当仅选择一或两帧进行字幕生成时，在大多数帧预算下获得最佳CIDEr分数。在ActivityNet Captions上，PEEK尤其强大，在16个配置中赢得14个。在MSR-VTT上的零样本评估表明，我们的模型在低帧预算下迁移效果最佳，而在四帧和八帧时结果更为混合，因为时间覆盖和视觉多样性变得更具竞争力。与最近的自适应基线相比，PEEK在低预算场景下更准确且更高效：它仅增加5.2%的字幕生成时间，而CSTA增加65.4%，MaxInfo增加211.9%。我们在https://github.com/momentslab/peek发布代码和预训练检查点。

英文摘要

Video-language models can process only a limited number of frames, making frame selection a key bottleneck for efficient video captioning. Most captioning pipelines still rely on uniform sampling, which is computationally cheap but agnostic to visual content. Adaptive frame sampling has recently emerged as a promising approach for selecting the most informative frames from a video; however, existing methods remain computationally expensive. We introduce PEEK, an efficient dynamic frame sampling method that distills caption-conditioned frame relevance rankings from a stronger teacher model into a lightweight temporal model that operates only on visual content. We find that, overall, on ActivityNet Captions and MSR-VTT, our method outperforms state-of-the-art methods across all evaluated downstream vision language models, especially when only one or two frames are selected for captioning, obtaining the best CIDEr for most frame budgets. On ActivityNet Captions, PEEK is particularly strong, winning 14 out of 16 configurations. Zero-shot evaluation on MSR-VTT shows that our model transfers best at low frame budgets, while results at four and eight frames are more mixed as temporal coverage and visual diversity become increasingly competitive. Compared with recent adaptive baselines, PEEK is both more accurate in the low-budget regime and more efficient: it adds only $5.2\%$ to the captioning time, compared with $65.4\%$ for CSTA and $211.9\%$ for MaxInfo. We release our code and pre-trained checkpoint at https://github.com/momentslab/peek.

URL PDF HTML ☆

赞 0 踩 0

2605.31027 2026-06-01 cs.LG

Multi-Scale Separable Fourier Neural Networks for Solving High-Frequency PDEs

多尺度可分离傅里叶神经网络用于求解高频偏微分方程

Qihong Yang, Qiaolin He

AI总结提出多尺度可分离傅里叶神经网络（MS-SFNN），通过可分离表示、随机固定权重和傅里叶特征嵌入，高效精确求解线性和非线性高频偏微分方程。

详情

Comments: 51 pages, 27 figures

AI中文摘要

我们提出了一种新颖的神经网络架构，称为多尺度可分离傅里叶神经网络（MS-SFNN），用于精确高效地求解线性和非线性高频偏微分方程（PDE）。MS-SFNN利用可分离表示：给定一个$d$维输入，它采用$d$个独立的子网络——每个作用于单个坐标——并通过其输出的逐元素乘法构造基函数。PDE解被近似为这些基函数的线性组合，系数由最小二乘法确定。关键的是，所有网络权重和偏置仅从单位方差的均匀分布随机初始化一次，之后保持不变。为了增强表达能力，在每个子网络中引入可调缩放因子以调节所得基函数的频率内容。通过余弦激活显式嵌入傅里叶特征，赋予该方法强大的谱逼近能力。为了缓解高频或三维问题中密集配置带来的内存瓶颈，我们用解析推导的基函数导数替代自动微分，并开发了一种内存高效的批处理QR分解算法来求解大规模最小二乘系统。数值实验表明，MS-SFNN在一系列具有挑战性的PDE上达到了前所未有的精度，显著优于物理信息神经网络（PINN）和分离变量谱神经网络（SV-SNN）等最先进方法。

英文摘要

We propose a novel neural network architecture, termed Multi-Scale Separable Fourier Neural Networks (MS-SFNN), for the accurate and efficient solution of linear and nonlinear high-frequency partial differential equations (PDEs). MS-SFNN exploits a separable representation: given a $d$-dimensional input, it employs $d$ independent subnetworks -- each acting on a single coordinate -- and constructs basis functions via element-wise multiplication of their outputs. The PDE solution is approximated as a linear combination of these basis functions, with coefficients determined by least squares. Critically, all network weights and biases are randomly initialized once, from a uniform distribution with unit variance, and remain fixed thereafter. To enhance expressivity, a tunable scaling factor is introduced in each subnetwork to modulate the frequency content of the resulting basis functions. Fourier features are explicitly embedded through cosine activations, endowing the method with strong spectral approximation capabilities. To mitigate the memory bottleneck associated with dense collocation in high-frequency or three-dimensional problems, we replace automatic differentiation with analytically derived basis function derivatives and develop a memory-efficient batched QR decomposition algorithm for solving large-scale least-squares systems. Numerical experiments demonstrate that MS-SFNN achieves unprecedented accuracy across a range of challenging PDEs, significantly outperforming state-of-the-art methods such as Physics-Informed Neural Networks (PINN) and Separated-Variable Spectral Neural Networks (SV-SNN).

URL PDF HTML ☆

赞 0 踩 0

2605.31025 2026-06-01 cs.CL

TRACE: Discovering Task-Specific Parameter via Adaptation-Aware Probing for Continual Fine-Tuning

TRACE: 通过适应感知探测发现任务特定参数以实现持续微调

Xiaosong Han, Ke Chen, Xindi Dai, Di Liang, Minlong Peng, Wei Pang, Fausto Giunchiglia, Xiaoyue Feng, Yonghao Liu, Renchu Guan

AI总结提出TRACE方法，通过适应感知探测发现任务特定核心参数，在持续微调中仅更新这些参数以缓解灾难性遗忘，并验证了跨模型和规模的迁移性。

详情

DOI: 10.1145/3770855.3817801
Comments: KDD2026

AI中文摘要

在实际部署中，大型语言模型通常需要跨任务持续适应以保持最新状态，新的微调应保留先前学到的技能。然而，不加区分地混合任务会稀释任务特化，而顺序微调（全参数或低秩适应）常因破坏性覆盖导致灾难性遗忘。基于回放的持续微调和维护单独的任务特定适配器可以缓解遗忘，但引入了额外的计算、存储和管理开销。认识到LLM参数对于任何单一任务都存在冗余，我们将持续任务适应重新定义为通过适应感知探测发现任务特定参数：短时预热探测暴露任务的适应轨迹，使我们能够识别并隔离每个任务所需的一小部分关键参数，以缓解灾难性遗忘。基于这一观点，我们引入了TRACE，一种通过适应感知探测发现任务特定参数以实现持续微调的新方法。我们进行短时预热微调，通过比较预热模型和预训练模型来推导任务特定核心参数。核心参数通过两种策略识别：重要性评分（L2范数和Fisher信息）和特异性分析（参数更新的余弦相似度）。在持续微调设置中，仅更新当前任务的核心参数，其余参数保持冻结，从而保留先前知识。我们在多个标准基准上进行了广泛实验，证明了所提方法的优越性能。此外，我们通过跨模型和规模迁移性研究验证了方法的泛化能力，展示了在资源约束下指导大规模模型微调的“小到大”范式。

英文摘要

In real-world deployment, LLMs are often adapted continually across tasks to keep LLMs up-to-date in production, where new fine-tuning should preserve previously learned skills. However, indiscriminately mixing tasks can dilute task specialization, while sequential fine-tuning (full-parameter or low rank adaptation) often causes catastrophic forgetting due to destructive overwriting. Replay-based continual tuning and maintaining separate task-specific adapters can mitigate forgetting, but introduce additional compute, storage, and management overhead. Recognizing the redundancy of LLM parameters for any single task, we reframe continual task adaptation as task-specific parameter discovery via adaptation-aware probing: a short warm-start probe exposes a task's adaptation trace, enabling us to identify and isolate the small subset of parameters essential for each task to mitigate catastrophic forgetting. Building on this view, we introduce TRACE, a novel approach for discovering Task-specific paRameters via Adaptation-aware probing for Continual finE-tuning. We perform a short warm-start fine-tune to derive task-specific core parameters by comparing the warm-started and pre-trained models. Core parameters are identified via two strategies: importance scoring (L$_2$ norm and Fisher Information) and specificity analysis (cosine similarity of parameter updates). In continual fine-tuning settings, only the active task's core parameters are updated while others remain frozen, preserving prior knowledge. We conduct extensive experiments across multiple standard benchmarks to demonstrate the superior performance of our proposed method. Additionally, we validate the generalization of our method through a cross-model and scale transferability study, demonstrating a "small-to-large" paradigm that guides the fine-tuning of large-scale models under resource constraints.

URL PDF HTML ☆

赞 0 踩 0

2605.31023 2026-06-01 cs.AI cs.LG cs.MA

HADT: A Heterogeneous Multi-Agent Differential Transformer for Autonomous Earth Observation Satellite Cluster

HADT: 一种用于自主对地观测卫星集群的异构多智能体差分Transformer

Mohamad A. Hady, Muhammad Anwar Masum, Siyi Hu, Mahardhika Pratama, Jimmy Cao, Ryszard Kowalczyk

AI总结针对异构卫星集群自主对地观测任务，提出基于Transformer的架构，通过关系观测-动作令牌化和差分注意力机制实现自适应实时资源管理，性能显著优于基线。

详情

Comments: Accepted in ECML-PKDD 2026. arXiv admin note: text overlap with arXiv:2511.12792

AI中文摘要

本文解决了执行对地观测任务（包括光学和合成孔径雷达卫星）的异构卫星集群中的自主资源管理问题。在自主运行模式下，卫星配备智能能力，能够根据最新条件实时决策，同时最小化与地面操作员的交互。传统的调度方法通常依赖数学模型来表示卫星任务和资源管理，然后通过优化算法求解。然而，当底层模型不可用、过于复杂或因空间任务环境中的动态变化和不确定性而不准确时，此类解决方案效果不佳。一个有前景的替代方案是将问题重新表述为序列决策过程，并应用无模型强化学习技术来实现自适应和实时资源管理。为此，我们提出了一种新颖的基于Transformer的架构，专门针对异构卫星集群自主对地观测任务，采用关系观测-动作令牌化和差分注意力机制。我们的实验结果表明，与现有基线相比，性能有显著提升。此外，所提出的架构在不同卫星集群数量下表现出强大的适应性和可迁移性。

英文摘要

This work addresses the problem of autonomous resource management in heterogeneous satellite cluster conducting Earth Observation (EO) missions including optical and Synthetic Aperture Radar (SAR) satellites. In autonomous operation mode, satellites are equipped with intelligent capabilities enabling real-time decision-making based on the latest conditions, while requiring minimal interaction with ground operators. Traditional scheduling approaches typically rely on mathematical models to represent satellite mission and resource management. Then, this problem is solved by using optimization algorithms. However, such solutions become less effective when the underlying models are not available, over complex, and inaccurate due to dynamic changes and uncertainties inherent in the space mission environment. A promising alternative is to reformulate the problem as a sequential decision-making process and apply model-free reinforcement learning techniques to enable adaptive and real-time resource management. To this end, we propose a novel transformer-based architecture tailored for heterogeneous satellite cluster autonomous EO Mission with relational observations-actions tokenization and differential attention mechanism. Our experimental results demonstrate significant performance improvements compared to the available baselines. Moreover, the proposed architecture exhibits strong adaptability and transferability with respect to varying numbers of satellite clusters.

URL PDF HTML ☆

赞 0 踩 0

2605.31022 2026-06-01 cs.LG

Augmented Lagrangian Predictive Coding

增广拉格朗日预测编码

Jeffrey Seely, Julian Gould

AI总结提出增广拉格朗日预测编码（PC-ALM），通过层局部拉格朗日乘子累积约束误差，使局部更新对齐反向传播梯度，在深度网络中匹配反向传播性能。

详情

Comments: 22 pages, 10 figures

AI中文摘要

预测编码（PC）是反向传播（BP）的一种局部学习替代方案，通过局部能量最小化动力学而非全局反向传播来训练深度网络。我们引入了增广拉格朗日预测编码（PC-ALM），它保持了PC的推理预算，但通过将每层约束误差累积到层局部拉格朗日乘子中，使每个权重更新与BP对齐。在线性PC网络中，PC-ALM收敛到一个平衡点，其中精确的BP梯度仅通过层局部更新分布在整个网络中。我们在深度达128的非线性PC网络中分析了PC-ALM，并表明它在所有宽度-深度设置下匹配BP性能，特别是在PC表现不佳的深度窄网络中。PC-ALM在每层激活中引入了循环动力学。与PC在标量能量上的热流相比，PC-ALM动力学由增广拉格朗日上的对偶上升驱动。我们观察到在非常深的网络中“弹道”式信用传播，信用信号均匀分布在各层，而PC则是缓慢、扩散的信用传播。除了算法本身，增广拉格朗日框架提供了PC的泛化，并可能为分布式系统如何通过纯局部动力学计算和传播类似BP的信用信号提供见解。

英文摘要

Predictive coding (PC) is a local-learning alternative to backpropagation (BP), training deep networks via local energy-minimization dynamics rather than a global backward pass. We introduce Augmented Lagrangian Predictive Coding (PC-ALM), which maintains PC's inference budget but aligns each weight update toward BP by accumulating per-layer constraint errors into a layer-local Lagrange multiplier. In linear PC networks, PC-ALM converges to an equilibrium with exact BP gradients distributed across the network via only layer-local updates. We analyze PC-ALM in nonlinear PC networks up to depth 128 and show that it matches BP performance across all width-depth regimes, notably in deep narrow networks where PC underperforms. PC-ALM introduces recurrent dynamics in each layer's activations. Compared to PC's heat flow on a scalar energy, PC-ALM dynamics are driven by dual ascent on the augmented Lagrangian. We observe "ballistic" credit propagation across very deep networks, with credit signals evenly distributed across layers, compared to PC's slow, diffusive credit propagation. Beyond the algorithm itself, the augmented Lagrangian framework offers a generalization of PC, and may yield insights into how distributed systems could compute and propagate BP-like credit signals through purely local dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.31021 2026-06-01 cs.AI cs.CL cs.LG

A Persona-Based Evaluation Framework for Pluralistic Alignment in Generative AI

基于人格的生成式AI多元对齐评估框架

Atahan Karagoz

AI总结提出一种状态空间约束仿真框架，通过合成认知轮廓替代单一评估函数，实现反映真实世界共识变异性的多元、视角依赖的基准测试，并分析仿真评估者的稳定性问题，论证动态调节机制的必要性。

详情

AI中文摘要

当前生成式人工智能的对齐范式主要依赖单一基准测试框架，将人类判断的多元性简化为聚合统计基线，从而掩盖了评估中的文化、人口和语境变异性。我们引入一种用于AI评估的状态空间约束仿真框架，用代表不同人类视角的合成认知轮廓的结构化流形替代单一评估函数。我们表明，现代生成架构能够以高度一致性实例化和维护这些评估人格，从而实现一种更接近现实世界共识变异性的多元、视角依赖的基准测试。然而，我们进一步分析了这些模拟评估者在顺序推理和随机提示扰动下的稳定性，揭示了人格一致性的系统性退化，表现为状态空间漂移和语义不一致。这些发现表明，静态对齐约束不足以维持随时间推移的稳健评估行为。相反，我们主张必须在生成系统中嵌入动态的、可行性驱动的调节机制，以保持连贯的认知仿真。通过将基于人格的评估视为潜在表征流形上的结构化动力系统，本研究为更自适应、更符合人类、更注重语境的AI评估方法奠定了基础。

英文摘要

Current alignment paradigms for generative artificial intelligence rely predominantly on monolithic benchmarking frameworks that reduce the plurality of human judgment to aggregated statistical baselines, thereby obscuring cultural, demographic, and contextual variability in evaluation. We introduce a state-space constrained emulation framework for AI evaluation that replaces singular assessment functions with a structured manifold of synthetic cognitive profiles representing diverse human perspectives. We show that modern generative architectures can instantiate and maintain these evaluative personas with high consistency, enabling a form of pluralistic, perspective-dependent benchmarking that more closely reflects real-world consensus variability. However, we further analyze the stability of these simulated evaluators under sequential inference and stochastic prompt perturbations, revealing systematic degradation in persona coherence that manifests as state-space drift and semantic inconsistency. These findings suggest that static alignment constraints are insufficient for sustaining robust evaluative behavior over time. Instead, we argue for the necessity of embedding dynamic, viability-driven regulatory mechanisms within generative systems to preserve coherent cognitive emulation. By framing persona-based evaluation as a structured dynamical system over latent representation manifolds, this study provides a foundation for more adaptive, human-aligned, and context-sensitive approaches to AI evaluation.

URL PDF HTML ☆

赞 0 踩 0

2605.31016 2026-06-01 cs.LG

An Efficient and Scalable Graph Condensation with Structure-Preserving

一种高效且可扩展的保结构图压缩方法

Yulin Hu, Fuyan Ou, Ye Yuan

AI总结提出一种解耦节点压缩与图结构生成的保结构图压缩方法（SP-ESGC），通过热核特征传播和混合聚类策略实现高效图压缩，并利用预训练边预测器生成可迁移的结构模式，在保持高计算效率的同时提升跨GNN架构的泛化能力。

详情

AI中文摘要

图压缩（GC）对于在资源受限场景中部署图神经网络（GNN）至关重要，它通过将大规模图压缩为紧凑的合成图来实现。现有的GC方法通常由于耦合优化而面临计算效率低的问题，并且在不同GNN架构上泛化能力差。为了解决这些挑战，本研究提出了一种高效且可扩展的保结构图压缩方法（SP-ESGC），该方法采用解耦设计，将节点压缩与图结构生成分离。具体来说，首先利用热核特征传播，通过谱图理论启发的扩散生成节点表示。进一步，设计了一种新颖的混合聚类策略，从节点表示中提取判别性的类内质心。最后，一个预训练的边预测器从原始图中推断可迁移的结构模式，确保合成图的准确生成。在真实世界图数据集上的大量实验表明，所提出的SP-ESGC实现了精确的图压缩，同时具有显著高的计算效率。此外，SP-ESGC在多种GNN架构上也具有良好的泛化能力。

英文摘要

Graph condensation (GC) is pivotal for enabling Graph Neural Networks (GNNs) deployment in resource-constrained scenarios by compressing large-scale graphs into compact synthetic counterparts. Existing GC methods commonly suffer from computational inefficiency due to coupled optimization as well as encountering poor generalization across GNN architectures. To address these challenges, this study proposes an Efficient and Scalable Graph Condensation with Structure-Preserving (SP-ESGC), which possesses a decoupled design that separates node condensation from graph structure generation. Specifically, it first employs heat kernel feature propagation to generate node representation via spectral graph theory-inspired diffusion. Further, a novel hybrid clustering strategy is designed to extracts discriminative intra-class centroids from the node representation. Finally, a pre-trained edge predictor infers transferable structural patterns from the original graph, ensuring accurate synthetic graph generation. Extensive experiments on real-world graph datasets demonstrate that the proposed SP-ESGC implementes a precise GC with significantly high computational efficiency. Moreover, SP-ESGC also generalizes well across diverse GNN architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.31013 2026-06-01 cs.LG

Physics-Informed Coarsening for Multigrid Graph Neural Surrogates

物理信息粗化用于多重网格图神经网络代理

Amir Bazzi, David Cardinaux, Ramy Nemer, Jose Alaves, Arjun Kalkur Matpadi Raghavendra, Elie Hachem

AI总结针对固体力学中的非线性弹性、塑性和瞬态行为，提出一种结合物理信息粗化策略的多重网格图神经网络，通过基于残差的局部活动评分保留高应变/应力区域，实现分层消息传递，提升长期滚动稳定性和精度。

详情

Comments: Accepted at ICML 2026. 16 pages, 5 figures

AI中文摘要

基于学习的偏微分方程代理最近在流体设置和结构化几何中达到了经典求解器的精度，同时实现了数量级的加速。相比之下，尽管存在非线性弹性、塑性和瞬态行为挑战标准架构，但针对可变形固体的鲁棒代理仍未得到充分探索。我们提出了一种用于固体力学的多重网格图神经网络，它将编码器-处理器-解码器主干与物理信息粗化策略相结合。我们的方法不是通过几何启发式进行下采样，而是使用基于残差的局部物理活动度量对节点进行评分，并优先保留高应变或应力集中区域，在最需要的地方分配多尺度容量。这通过分层消息传递保留了长程相互作用，同时提高了长期滚动的稳定性。我们在涵盖线性、非线性和瞬态状态的多个数据集上进行评估，并观察到与标准采样基线相比，在精度和滚动稳定性方面的一致提升。我们的结果突出了物理信息粗化对于固体力学中可扩展代理建模的重要性。

英文摘要

Learning-based surrogates for partial differential equations have recently matched the accuracy of classical solvers while achieving orders-of-magnitude speedups, predominantly in fluid settings and structured geometries. In contrast, robust surrogates for deformable solids remain underexplored, despite the presence of nonlinear elasticity, plasticity, and transient behavior that challenge standard architectures. We introduce a multigrid graph neural network for solid mechanics that couples an encoder-processor-decoder backbone with a physics-informed coarsening strategy. Instead of downsampling via geometric heuristics, our method scores nodes using a residual-based measure of local physical activity and preferentially retains regions of high strain or stress concentration, allocating multiscale capacity where it is most needed. This preserves long-range interactions through hierarchical message passing while improving stability over long rollouts. We evaluate on multiple datasets covering linear, nonlinear, and transient regimes, and observe consistent gains in accuracy and rollout stability compared to standard sampling baselines. Our results highlight the importance of physics-informed coarsening for scalable surrogate modeling in solid mechanics.

URL PDF HTML ☆

赞 0 踩 0

2605.31010 2026-06-01 cs.CL

MoG: Mixture of Experts for Graph-based Retrieval-Augmented Generation

MoG：用于基于图的检索增强生成的混合专家模型

Zheng Yuan, Chuang Zhou, Linhao Luo, Siyu An, Di Yin, Xing Sun, Xiao Huang

AI总结提出MoG框架，通过组织知识为中心枢纽图和稀疏激活的专家图，利用拓扑感知路由器动态选择相关专家图，以解决检索增强生成中统一知识库引入无关信息的问题，在MuSiQue上相对提升超过20%。

详情

AI中文摘要

检索增强生成被广泛研究以将大型语言模型建立在外部证据上。然而，从统一的知识库中检索可能会不可避免地引入无关信息，从而误导复杂推理的生成。受混合专家（MoE）条件计算的启发，其中路由器为每个输入稀疏地选择专门的专家以及共享专家，我们提出了用于基于图的检索增强生成的混合专家模型，即MoG。它将知识组织为两个核心组件：（i）多样且始终可访问的枢纽图，编码语义和结构上的核心知识，并为专家激活提供上下文线索；（ii）稀疏激活的专家图，包含特定领域的证据。MoG首先访问枢纽图以识别一般证据并推导上下文线索。然后，一个拓扑感知路由器根据查询动态激活一组有限的专家图，从而将检索限制在一个集中的证据子空间中。在具有挑战性的基准测试上的大量实验表明，MoG始终优于强基线，在MuSiQue上相对提升超过20%。我们的代码可在https://github.com/DEEP-PolyU/MoG获取。

英文摘要

Retrieval-augmented generation is intensively studied to ground large language models on external evidence. However, retrieving from a unified knowledge base could inevitably introduce irrelevant information that may mislead generation for complex reasoning. Inspired by the conditional computation of mixture of experts (MoE), where a router sparsely selects specialized experts alongside shared ones for each input, we propose \textbf{M}ixture \textbf{o}f experts for \textbf{G}raph-based Retrieval-Augmented Generation, i.e., \textbf{MoG}. It organizes knowledge into two core components: (i) diverse, always-accessible hub graphs that encode semantically and structurally central knowledge and provide contextual clues for expert activation, and (ii) sparsely activated expert graphs that contain domain-specific evidence. MoG first accesses hub graphs to identify general evidence and derive contextual clues. Then, a topology-aware router dynamically activates a limited set of expert graphs conditioned on the query, thereby confining retrieval to a focused evidence subspace. Extensive experiments on challenging benchmarks show that MoG consistently outperforms strong baselines, with over 20\% relative improvement on MuSiQue. Our code is available in https://github.com/DEEP-PolyU/MoG.

URL PDF HTML ☆

赞 0 踩 0