arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.15308 2026-05-18 cs.AI cs.LG cs.MA

SMCEvolve: Principled Scientific Discovery via Sequential Monte Carlo Evolution

SMCEvolve：通过序列蒙特卡洛进化进行原理性科学发现

Jiachen Jiang, Huminhao Zhu, Zhihui Zhu

AI总结 SMCEvolve通过将程序搜索视为从奖励倾斜的目标分布中采样，并利用序列蒙特卡洛采样器近似该分布，提出三种核心机制：自适应父采样、变异与接受的混合、自动收敛控制，从而在数学、算法效率、符号回归和端到端ML研究基准中超越现有系统。

详情

AI中文摘要

LLM驱动的程序进化已成为自动化科学发现的强大工具，但现有框架缺乏设计其各个组件的原理性指导，并无法保证搜索收敛。我们介绍了SMCEvolve，将其程序搜索重新解释为从奖励倾斜的目标分布中采样，并用序列蒙特卡洛（SMC）采样器近似该分布。从这一视角，三种核心机制浮现为原理性组件：自适应父采样、变异与接受的混合、自动收敛控制。我们进一步提供有限样本复杂性分析，该分析界定了达到目标近似误差所需的LLM调用预算。在数学、算法效率、符号回归和端到端ML研究基准上，SMCEvolve在超越现有最先进的进化系统的同时，使用更少的LLM调用次数在自定终止条件下运行。代码可在https://github.com/kongwanbianjinyu/SMCEvolve获取。

英文摘要

LLM-driven program evolution has emerged as a powerful tool for automated scientific discovery, yet existing frameworks offer no principled guide for designing their individual components and provide no guarantee that the search converges. We introduce SMCEvolve, which recasts program search as sampling from a reward-tilted target distribution and approximates it with a Sequential Monte Carlo (SMC) sampler. From this view, three core mechanisms emerge as principled components: adaptive parent resampling, mixture of mutation with acceptance, and automatic convergence control. We further provide a finite-sample complexity analysis that bounds the LLM-call budget required to reach a target approximation error. Across math, algorithm efficiency, symbolic regression, and end-to-end ML research benchmarks, SMCEvolve surpasses state-of-the-art evolving systems while using fewer LLM calls under self-determined termination. The code is available at https://github.com/kongwanbianjinyu/SMCEvolve.

URL PDF HTML ☆

赞 0 踩 0

2605.15306 2026-05-18 cs.LG stat.ML

How Data Augmentation Shapes Neural Representations

数据增强如何塑造神经表示

Tianxiao He, Alex H. Williams, Sarah E. Harvey

AI总结研究探讨不同数据增强策略如何改变神经网络内部表示的几何结构，揭示增强强度与表示形状的关系，以及神经几何在模型集成中的应用。

详情

AI中文摘要

数据增强被广泛用于提升深度网络的泛化能力，但其对学习表示几何结构的影响仍不明确。本文通过形状分析工具，将网络隐藏表示嵌入到度量空间中，该空间对缩放、平移、旋转和反射不变。研究显示，增强强度增加导致空间中轨迹更稳定，不同增强类型引导表示朝不同方向发展。此外，研究探讨神经表示形状如何沿数据增强轨迹扭曲，并表明神经几何学可预测在模型集成中表现最佳的表示。结果揭示了不同架构和种子间的共享几何模式，表明分析形状空间轨迹是理解和比较数据增强方法的原理性工具。

英文摘要

Data augmentation is widely recognized for improving generalization in deep networks, yet its impact on the geometry of learned representations remains poorly understood. In this work, we characterize how different data augmentation strategies reshape internal representations in neural networks. Using tools from shape analysis, we embed network hidden representations into a metric space where distance is invariant to scaling, translation, rotation and reflection. We show that increasing augmentation strength leads to well-behaved trajectories in this space, and that different augmentation types steer representations in distinct directions. Moreover, we investigate how neural representation shapes are distorted along data augmentation trajectories, and show that insights from neural geometry can predict which representations provide the most improvement when ensembling models. Our results reveal shared geometric patterns across architectures and seeds, and suggest that analyzing shape-space trajectories offers a principled tool for understanding and comparing data augmentation methods.

URL PDF HTML ☆

赞 0 踩 0

2605.15304 2026-05-18 cs.CL

DiscoExplorer: An Open Interface for the Study of Multilingual Discourse Relations

DiscoExplorer：多语言语篇关系研究的开放接口

Amir Zeldes

AI总结本文提出DiscoExplorer，一个本地可运行的开源网页接口，用于公开多语言语篇关系数据集，涵盖16种语言，提供查询语言、搜索和可视化工具，展示连接词等信号装置及示例研究。

2605.15301 2026-05-18 cs.AI

Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution

Solvita：通过代理进化增强大型语言模型以应对编程竞赛

Han Li, Jinyu Tian, Rili Feng, Yuqiao Du, Chong Zheng, Chenyu Wang, Chenchen Liu, Shihao Li, Xinping Lei, Yifan Yao, Weihao Xie, Letian Zhu, Jiaheng Liu

AI总结 Solvita通过闭环系统和可训练知识网络，使代理动态学习，提升编程竞赛任务的准确性和经验积累。

详情

AI中文摘要

大型语言模型（LLMs）在严格的编程竞赛推理需求上仍存在挑战。尽管最近的多代理框架试图弥合这一可靠性差距，但它们本质上是无状态的：它们依赖静态检索并丢弃了从先前任务中获得的有价值的解决问题和调试经验。为了解决这一问题，我们提出了Solvita，一个代理进化框架，它允许持续学习而无需对基础LLM进行权重更新。Solvita将问题解决重新组织为一个闭环系统，包括策略选择、程序合成、认证监督和针对性破解，由四个专门的代理：规划者、求解者、Oracle和黑客执行。关键的是，每个代理都配有一个可训练的图结构知识网络。随着系统的运行，结果信号，如通过/失败判决、测试认证质量和黑客发现的对抗性漏洞，被重新解释为强化学习更新这些网络权重。这使代理能够根据过去的成功和失败动态路由未来的查询，从而在时间上积累可转移的推理经验。在CodeContests、APPS、AetherCode和实时Codeforces轮次中评估，Solvita在代码生成代理中建立了新的最先进的状态，优于现有的多代理流程，并几乎将单次流程基线的准确性翻倍。

英文摘要

Large language models (LLMs) still struggle with the rigorous reasoning demands of hard competitive programming. While recent multi-agent frameworks attempt to bridge this reliability gap, they remain fundamentally stateless: they rely on static retrieval and discard the valuable problem-solving and debugging experience gained from previous tasks. To address this, we present Solvita, an agentic evolution framework that enables continuous learning without requiring weight updates to the underlying LLM. Solvita reorganizes problem-solving into a closed-loop system of strategy selection, program synthesis, certified supervision, and targeted hacking, executed by four specialized agents: Planner, Solver, Oracle, and Hacker. Crucially, each agent is paired with a trainable, graph-structured knowledge network. As the system operates, outcome signals, such as pass/fail verdicts, test certification quality, and adversarial vulnerabilities discovered by the Hacker, are recast as reinforcement learning updates to these network weights. This allows the agents to dynamically route future queries based on past successes and failures, effectively accumulating transferable reasoning experience over time. Evaluated across CodeContests, APPS, AetherCode, and live Codeforces rounds, Solvita establishes a new state-of-the-art among code-generation agents, outperforming existing multi-agent pipelines and nearly doubling the accuracy of single-pass baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.15300 2026-05-18 cs.CV

Deep Pre-Alignment for VLMs

视觉语言模型的深度预对齐

Tianyu Yu, Kechen Fang, Zihao Wan, Kaidong Zhang, Yicheng Zhang, Jun Song, Bo Zheng, Yuan Yao

AI总结本文提出深度预对齐(DPA)，通过替换传统ViT编码器为小型VLM作为感知器，实现视觉特征与目标大语言模型文本空间的深度对齐，提升了多模态基准性能，并降低了语言能力遗忘。

Comments Accepted by ICML 2026. Project Website: https://github.com/THUMAI-Lab/Deep-Pre-Alignment

详情

AI中文摘要

大多数视觉语言模型(VLMs)通过轻量级投影器将ViT编码器的输出直接映射到LLM。尽管有效，最近的分析表明这种架构存在对齐挑战：视觉特征在LLM的初始层仍远离文本空间，迫使模型在表面模态对齐上浪费关键深度~\cite{zhang-etal-2024-investigating,artzy-schwartz-2024-attend}，而非深入理解和复杂推理。在本工作中，我们提出深度预对齐(DPA)，一种新颖的架构，用小型VLM作为感知器替换标准ViT编码器，确保视觉特征与目标大语言模型的文本空间深度对齐。全面实验展示了DPA的有效性。在4B参数规模上，DPA在8个多模态基准上比基线高出1.9分，随着规模扩大到32B，增益扩大至3.0分。此外，通过将对齐任务委托给感知器，DPA在3个文本基准上实现了32.9\%的语言能力遗忘减少。我们进一步证明这些增益在不同LLM家族中保持一致，包括Qwen3和LLaMA 3.2，突显了我们方法的通用性。除了性能，DPA还为当前VLM开发提供了无缝升级路径，只需对视觉编码器进行模块化替换，计算开销微小。

英文摘要

Most Vision Language Models (VLMs) directly map outputs from ViT encoders to the LLM via a lightweight projector. While effective, recent analysis suggests this architecture suffers from an alignment challenge: visual features remain distant from the text space in the initial layers of the LLM, forcing the model to waste critical depth~\cite{zhang-etal-2024-investigating,artzy-schwartz-2024-attend} on superficial modality alignment rather than deep understanding and complex reasoning. In this work, we propose Deep Pre-Alignment (DPA), a novel architecture that replaces the standard ViT encoder with a small VLM as perceiver, ensuring visual features are deeply aligned with the text space of the target large language model. Comprehensive experiments demonstrate the effectiveness of DPA. On the 4B parameter scale, DPA outperforms baselines by 1.9 points across 8 multimodal benchmarks, with gains widening to 3.0 points at the 32B scale. Moreover, by offloading alignment to the perceiver, DPA achieves a 32.9\% reduction in language capability forgetting over 3 text benchmarks. We further demonstrate that these gains are consistent across different LLM families including Qwen3 and LLaMA 3.2, highlighting the generality of our approach. Beyond performance, DPA also offers a seamless upgrade path for current VLM development, requiring only a modular replacement for the visual encoder with marginal computation overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.15298 2026-05-18 cs.RO cs.AI cs.CL cs.CV

PhysBrain 1.0 Technical Report

PhysBrain 1.0 技术报告

Shijie Lian, Bin Yu, Xiaopeng Lin, Changti Wu, Hang Yuan, Xiaolin Hu, Zhaolong Shen, Yuzhuo Miao, Haishan Liu, Yuxuan Tian, Yukun Shi, Cong Huang, Kai Chen

AI总结 PhysBrain 1.0 通过将大规模人类自体视频转化为结构化的物理常识监督，提升机器人适应能力，在多模态问答和具身控制基准测试中取得SOTA结果，尤其在SimplerEnv中表现突出。

Comments Project Page: https://phys-brain.github.io

详情

AI中文摘要

视觉-语言-动作模型快速发展，但机器人轨迹单独学习广泛物理理解有限。PhysBrain 1.0研究了一种互补方法：将大规模人类自体视频转换为结构化的物理常识监督，再用于机器人适应。我们的数据引擎提取场景元素、空间动态、动作执行和深度感知关系，将其转化为问题-答案监督训练PhysBrain VLMs。所得物理先验通过保留能力且语言敏感的适应设计转移至VLA策略。在多模态问答基准和具身控制基准，包括ERQA、PhysBench、SimplerEnv-WidowX、LIBERO和RoboCasa中，PhysBrain 1.0取得SOTA结果，尤其在SimplerEnv中表现突出。这些结果表明，从人类交互视频中扩展物理常识能有效连接多模态理解与机器人动作。

英文摘要

Vision-language-action models have advanced rapidly, but robot trajectories alone provide limited coverage for learning broad physical understanding. PhysBrain 1.0 studies a complementary route: converting large-scale human egocentric video into structured physical commonsense supervision before robot adaptation. Our data engine extracts scene elements, spatial dynamics, action execution, and depth-aware relations, then turns them into question-answer supervision for training PhysBrain VLMs. The resulting physical priors are further transferred to VLA policies through a capability-preserving and language-sensitive adaptation design. Across multimodal QA benchmarks and embodied control benchmarks, including ERQA, PhysBench, SimplerEnv-WidowX, LIBERO, and RoboCasa, PhysBrain 1.0 achieves SOTA results and shows especially strong out-of-domain performance on SimplerEnv. These results suggest that scaling physical commonsense from human interaction video can provide an effective bridge from multimodal understanding to robot action.

URL PDF HTML ☆

赞 0 踩 0

2605.15295 2026-05-18 cs.LG cs.AI cs.CY

GESD: Beyond Outcome-Oriented Fairness

GESD：超越以结果为导向的公平性

Gideon Popoola, John Sheppard

AI总结本文提出GESD，一种以过程为导向的公平性度量，用于衡量模型解释在不同保护类别子组中的稳定性、鲁棒性和敏感性差异。通过多目标优化框架FEU，提升公平性和实用性。

Comments 7 pages, Accepted at IEEE CAI

详情

AI中文摘要

机器学习（ML）算法日益应用于高风险决策领域，如贷款审批、招聘和再犯预测。尽管现有公平性度量（如统计平等、等机会）能有效量化结果导向的不平等，但对偏见决策的过程或解释缺乏洞察。为此，我们提出组级解释稳定性不平等（GESD），一种以过程为导向的公平性度量，衡量不同保护类别子组中模型解释稳定性、鲁棒性和敏感性的差异。GESD是解释器无关、模型无关的，并将公平性分析扩展到可解释性层面。我们进一步将GESD整合到多目标优化框架中，联合优化效用、基于结果的公平性和基于解释的公平性，称为FEU（公平性-可解释性-效用）。在多个基准数据集上的实验证明，GESD有效捕捉了组间解释质量的差异，且FEU在效用和公平性方面优于现有方法。通过连接基于结果和基于解释的公平性，GESD提供了一种全面的工具，用于诊断和减轻预测建模中的偏见。我们的代码和数据集可在GitHub上获得（https://github.com/horlahsunbo/GESD）

英文摘要

Machine learning (ML) algorithms are increasingly deployed in high-stakes decision-making domains such as loan approvals, hiring, and recidivism predictions. While existing fairness metrics (e.g., statistical parity, equal opportunity) effectively quantify outcome-oriented disparities, they offer limited insight into the procedure or explanation behind biased decisions. To address this gap, we propose Group-level Explanation Stability Disparity (GESD), a \textit{procedural-oriented} fairness metric that measures disparities in the stability, robustness, and sensitivity of model explanations across different subgroups in a protected category. %GESD is explainer-agnostic, model-agnostic, and extends the scope of fairness analyses to the level of explainability. We further integrate GESD into a multi-objective optimization framework that jointly optimizes for utility, outcome-based fairness, and explanation-based fairness called FEU (Fairness--Explainability--Utility). Empirical results on multiple benchmark datasets show that GESD effectively captures group-wise discrepancies in explanation quality, and that FEU improves both utility and fairness over state-of-the-art methods. By bridging outcome-based and explanation-based fairness, GESD offers a comprehensive tool for diagnosing and mitigating bias in predictive modeling. Our code and datasets are available on GitHub {\hyperlink{https://github.com/horlahsunbo/GESD}{https://github.com/horlahsunbo/GESD}}

URL PDF HTML ☆

赞 0 踩 0

2605.15290 2026-05-18 cs.LG cs.AI

GQA-μP: The maximal parameterization update for grouped query attention

GQA-μP：组查询注意力的最大参数更新

Kyle R. Chickering, Huijuan Wang, Mengxi Wu, Alexander Moreno, Muhao Chen, Xuezhe Ma, Daria Soboleva, Joel Hestness, Zhengzhong Liu, Eric Xing

AI总结本文基于谱特征学习观点，提出组查询注意力的最大参数更新方法，通过数学分析实现参数转移，解决了新模型架构下的参数更新难题。

Comments 18 pages

详情

AI中文摘要

超参数在不同模型架构间的转移显著减少了调整大型语言模型（LLMs）所需的计算量。最大更新参数化（μP）通过原则性的数学分析确保转移，但对新模型架构的推导可能具有挑战性。基于Yang等人（2023a）的谱特征学习观点，我们做出了两项进展。首先，我们将权重的谱范数条件从启发式方法提升到特征学习的定义，从而推导出Complete-P深度和权重衰减缩放，而无需依赖懒学习。其次，我们考虑了一种修改的谱范数，该范数在权重矩阵非满秩时保持网络权重的有效缩放定律。这使我们能够（到目前为止）推导出组查询注意力（GQA）的μP缩放。我们通过展示学习率在GQA重复超参数上的转移以及关于权重衰减的实验，证明了我们理论推导的有效性。

英文摘要

Hyperparameter transfer across model architectures dramatically reduces the amount of compute necessary for tuning large language models (LLMs). The maximal update parameterization (μP) ensures transfer through principled mathematical analysis but can be challenging to derive for new model architectures. Building on the spectral feature-learning view of Yang et al. (2023a), we make two advances. First, we promote spectral norm conditions on the weights from a heuristic to the definition of feature learning, and as a consequence arrive at the Complete-P depth and weight-decay scalings without recourse to lazy-learning. Second, we consider a modified spectral norm that preserves the valid scaling law of network weights when weight matrices are not full rank. This enables (to our knowledge, the first) derivation of μP scalings for grouped-query attention (GQA). We demonstrate the efficacy of our theoretical derivations by showing learning rate transfer across the GQA repetition hyperparameter as well as experiments regarding transfer over weight decay.

URL PDF HTML ☆

赞 0 踩 0

2605.15285 2026-05-18 cs.LG cs.AI cs.NA math.FA math.NA math.OC

Universal Approximation of Nonlinear Operators and Their Derivatives

非线性算子及其导数的通用逼近

Filippo de Feo

AI总结本文提出通过运算学习架构证明非线性算子及其导数的通用逼近定理，扩展了经典结果到无限维空间，并探讨了其在高阶精度、约束优化和无限维PDE数值方法中的应用。

详情

AI中文摘要

导数引导的算子学习（DIOL），即学习非线性算子及其导数，是运算学习（OL）基础领域中的开放研究前沿。特别是非线性算子及其导数的通用逼近定理（UAT）是非线性泛函分析中的基础性开放问题和精细问题。本文证明了非线性k次可微算子在巴纳赫空间之间及其导数的首个通用逼近定理，统一在紧集上和加权Sobolev范数中，适用于一般有限输入测度。我们的结果是首次将经典结果[1991]扩展到无限维设置和OL。我们讨论了DIOL和UATs的应用领域：OL中的高阶精度、Banach空间中的快速约束优化（如PDE最优控制、反问题）和无限维PDE的数值方法（如来自PDE最优控制的HJB PDEs在Banach空间、SPDEs、路径依赖系统、部分观测系统、均场控制）。我们通过编码器-解码器架构参数化非线性算子，这些架构因其通用性而著名，包括经典架构如DeepONets、Deep-H-ONets、PCA-Nets。我们的结果基于四个关键特性，使我们能够证明UATs的全面通用性：（i）巴纳赫空间的逼近性质。（ii）Bastiani意义下的k次连续可微性（弱于Fréchet意义下的k次连续可微性）。（iii）自然的紧-开拓扑用于UA；确实，我们显示在标准紧-开拓扑诱导的算子范数下，即使对于Fréchet导数，UA也遭到破坏。（iv）为UA构造新的加权Sobolev空间。

英文摘要

Derivative-Informed Operator Learning (DIOL), i.e. learning a (nonlinear) operator and its derivatives, is an open research frontier at the foundations of the influential field of Operator Learning (OL). In particular, Universal Approximation Theorems (UATs) of nonlinear operators and their derivatives are foundational open questions and delicate problems in nonlinear functional analysis. In this manuscript, we prove the first UATs of non-linear $k$-times differentiable operators between Banach spaces and their derivatives, uniformly on compact sets and in weighted Sobolev norms for general finite input measures, via OL architectures. Our results are the first complete generalizations of the corresponding influential classical results in [Hornik, 1991] to infinite-dimensional settings and OL. We discuss several open areas where DIOL and our UATs find applications: high-order accuracy in OL, fast constrained optimization in Banach spaces (e.g. optimal control of PDEs, inverse problems) and numerical methods for infinite-dimensional PDEs (e.g. HJB PDEs on Banach spaces from optimal control of PDEs, SPDEs, path-dependent systems, partially observed systems, mean-field control). We parameterize nonlinear operators via Encoder-Decoder Architectures, renowned classes in OL due to their generality, including classical architectures, such as DeepONets, Deep-H-ONets, PCA-Nets. Our results are based on four key features that allow us to prove UATs in full generality: (i) Approximation Properties of Banach spaces. (ii) $k$-times continuous differentiability in the sense of Bastiani (weaker than $k$-times continuous Fréchet differentiability). (iii) Natural compact-open topologies for UA; indeed, we show that UA in standard compact-open topologies induced by operator norms is violated even for Fréchet derivatives. (iv) Construction of novel weighted Sobolev spaces for the UA.

URL PDF HTML ☆

赞 0 踩 0

2605.15284 2026-05-18 cs.LG

Tadpole: Autoencoders as Foundation Models for 3D PDEs with Online Learning

Tadpole：用于3D偏微分方程的自动编码器作为基础模型的在线学习

Qiang Liu, Felix Koehler, Benjamin Holzschuh, Nils Thuerey

AI总结 Tadpole通过在线数据生成框架预训练自动编码器，学习跨异构物理系统的丰富可迁移表示，支持高维扩展和多任务应用，包括动态学习和生成建模。

详情

AI中文摘要

我们介绍了Tadpole，一种新的三维偏微分方程（PDE）基础模型，解决了可迁移性、高维可扩展性和多功能性等关键挑战。Tadpole在由高效在线数据生成框架生成的合成3D PDE数据上预训练为自动编码器。这使得能够进行大规模、多样化的训练，无需存储或I/O开销，通过扩展到相当于数百TB的训练数据进行演示。通过自动编码单通道空间裁剪，Tadpole在具有不同状态变量数量和空间分辨率的异构物理系统中学习丰富的、可迁移的表示。尽管仅预训练为自动编码器，Tadpole可以高效地应用于多种下游任务，包括动态学习和生成建模。对于动态学习，我们提出了一种新颖的参数高效微调策略，结合低秩适应、潜在空间转换和重新引入的跳跃连接，以最小的可训练参数数量实现精确的时间建模。Tadpole在各种下游任务中展示了强大的微调性能，突显了其作为3D PDE学习基础模型的通用性和有效性。Tadpole的源代码和预训练权重可在https://github.com/tum-pbs/tadpole获取。

英文摘要

We introduce Tadpole, a novel foundation model for three-dimensional partial differential equations (PDEs) that addresses key challenges in transferability, scalability to high dimensionality, and multi-functionality. Tadpole is pre-trained as an autoencoder on synthetic 3D PDE data generated by an efficient online data-generation framework. This enables large-scale, diverse training without storage or I/O overhead, demonstrated by scaling to an equivalent of hundreds of terabytes of training data. By autoencoding single-channel spatial crops, Tadpole learns rich and transferable representations across heterogeneous physical systems with varying numbers of state variables and spatial resolutions. Although pre-trained solely as an autoencoder, Tadpole can be efficiently applied for multiple downstream tasks beyond reconstruction, including dynamics learning and generative modeling. For dynamics learning, we propose a novel parameter-efficient fine-tuning strategy that integrates low-rank adaptation, latent-space transformations, and reintroduced skip connections, achieving accurate temporal modeling with a minimal number of trainable parameters. Tadpole demonstrates strong fine-tuning performance across various downstream tasks, highlighting its versatility and effectiveness as a foundation model for 3D PDE learning. Source code and pre-trained weights of Tadpole are available at https://github.com/tum-pbs/tadpole

URL PDF HTML ☆

赞 0 踩 0

2605.15282 2026-05-18 cs.CL

Fluency and Faithfulness in Human and Machine Literary Translation

Sarah Griebel, Ted Underwood

AI总结本研究探讨了文学翻译中流畅性与忠实度之间的关系，分析了106部小说中13万余段人工及机器翻译文本。通过自动评估方法，发现流畅性与忠实度存在显著负相关，且该现象在人类翻译和谷歌翻译中尤为明显，而TranslateGemma则表现出较弱的相关性。研究结果表明，在文学翻译中，提升流畅性可能以牺牲忠实度为代价，且评估结果受文本长度影响。

Comments Accepted NLP4DH 2026

2605.15257 2026-05-18 cs.LG

Training on Documents About Monitoring Leads to CoT Obfuscation

Reilly Haskins, Bilal Chughtai, Joshua Engels

AI总结本文研究了模型在了解监控机制的情况下是否会通过隐藏其推理过程来逃避检测。研究者通过合成文档微调的方式，使八种模型接触描述思维链（CoT）监控的预训练风格文档，发现具备监控意识的模型在逃避检测方面的表现显著优于无意识的对照组。研究还表明，模型的思维链可控性与其成功隐藏推理的能力高度相关，并且具备监控意识的模型在同等强化学习压力下更快学会规避监控。这些结果表明，监控知识与高思维链可控性的结合可能对基于CoT的监控系统构成潜在风险。

2605.15256 2026-05-18 cs.CV

ReactiveGWM: Steering NPC in Reactive Game World Models

Zeqing Wang, Danze Chen, Zhaohu Xing, Zizhao Tong, Yinhan Zhang, Xingyi Yang, Yeying Jin

AI总结当前游戏世界模型多从玩家视角出发，将非玩家角色（NPC）仅视为背景像素，难以捕捉玩家与NPC之间的互动。为此，本文提出ReactiveGWM，一种能够模拟玩家与NPC动态交互的反应型游戏世界模型。该模型通过解耦玩家控制与NPC行为，并引入轻量级偏差注入和跨注意力模块，实现了对NPC高层策略（如进攻、防守）的灵活响应，且无需针对具体游戏进行再训练，具备跨游戏的零样本策略迁移能力。

Comments The code is available at https://inv-wzq.github.io/ReactiveGWM/

2605.15254 2026-05-18 cs.LG

Curriculum Learning of Physics-Informed Neural Networks based on Spatial Correlation

Xujia Chen, Xinyue Hu, Letian Chen, Daming Shi, Wenhui Fan

AI总结本文针对物理信息神经网络（PINNs）在求解偏微分方程时面临的训练不稳定、多目标约束不平衡及信息传播效率低等问题，提出了一种基于空间相关性的课程学习框架。该方法通过空间因果权重引导边界附近区域的信息向内传播，利用低频信息桥增强空间分离区域的一致性，并采用区域自适应重加权策略优化局部残差，从而有效提升训练稳定性和解的精度。实验表明，在相近计算成本下，该方法显著改善了PINNs的训练效果。

Comments 37 pages, 14 figures, 9 tables

2605.15253 2026-05-18 cs.LG

Position: Ideas Should be the Center of Machine Learning Research

Jairo Diaz-Rodriguez

AI总结本文指出当前机器学习研究日益分化为追求指标优化的工程实践和脱离实际的理想化理论，忽视了研究的核心应是“想法”本身。作者提出“以想法为中心”的研究框架，强调通过设计针对性实验验证想法在现代模型中的行为特征，而非单纯追求榜单成绩。这一转变有助于弥合理论与实践之间的差距，同时促进研究公平性，使资源有限的研究者也能做出严谨的科学贡献。

Comments Accepted into ICML 2026 https://icml.cc/virtual/2026/poster/67144

2605.15252 2026-05-18 cs.LG cs.AI eess.SP

PDRNN: Modular Data-driven Pedestrian Dead Reckoning on Loosely Coupled Radio- and Inertial-Signalstreams

Peter Bauer, Andreas Porada, Felix Ott, Christopher Mutschler, Tobias Feigl

AI总结本文提出了一种名为PDRNN的模块化数据驱动行人航位推算系统，用于处理松耦合的无线电与惯性传感器信号流。该方法基于简单循环神经网络架构，能够隐式预测不同估计方法下的异步传感器数据流，并通过独立的机器学习模型分别估计姿态、速度和位置等关键参数及其方差，最终融合模型结合这些输出以提升系统鲁棒性。实验表明，PDRNN在动态运动数据上的精度和稳定性优于传统方法和现有机器学习方法，同时具备更好的组件控制能力和预测能力。

Comments 12 pages

详情

DOI: 10.1109/PLANS61210.2025.11028330
Journal ref: IEEE/ION Position, Location and Navigation Symposium (PLANS), Salt Lake City, UT, May 2025

英文摘要

Modern pedestrian dead reckoning (PDR) systems rely on fusing noisy and biased estimates of position, velocity, and calibrated orientation derived from loosely coupled sensors to determine the current pose of a localized object. However, discrepancies in the sampling rates of sensor-specific estimation methods and unreliable transmission pose significant challenges. And traditional methods often fail to effectively fuse multimodal sensor data during dynamic movements characterized by high accelerations, velocities, and rapidly varying orientations. To address these limitations, we propose a simple recurrent neural network (RNN) architecture capable of implicitly forecasting asynchronous sensor data streams from diverse estimation methods along reference trajectories. The proposed approach introduces PDRNN, a modular hybrid AI-assisted PDR system that handles each component as an independent ensemble of machine learning (ML) models to estimate both key parameter means and variances. Separate ML-based models are employed to estimate orientation, (un)directed velocity or distance from acceleration and gyroscope data, with optional absolute positioning from synchronized radio systems such as 5G for stabilization. A final fusion model combines these outputs, position, velocity, and orientation, while using uncertainty estimates to enhance system robustness. The modular design allows individual components to be updated, fine-tuned, or replaced without affecting the entire system. Experiments on dynamic sports movement data show that PDRNN achieves superior accuracy and precision compared to classic and ML-based methods, effectively avoiding error accumulation common in black-box approaches. And PDRNN offers forecast capabilities and better component control despite increased system complexity.

URL PDF HTML ☆

赞 0 踩 0

2605.15246 2026-05-18 cs.LG

Privacy Evaluation of Generative Models for Trajectory Generation

Stavros Bouras, Ioannis Kontopoulos, Chiara Pugliese, Francesco Lettich, Emanuele Carlini, Hanna Kavalionak, Chiara Renso, Konstantinos Tserpes

AI总结轨迹数据在现代城市智能中具有重要作用，但其敏感性也带来了显著的隐私风险。本文研究了生成模型在轨迹生成任务中的隐私保护问题，指出现有生成模型虽然能够生成符合时空分布和移动模式的合成轨迹数据，但其生成特性并不意味着隐私得到保障。通过实施成员推理攻击，作者揭示了生成轨迹模型在隐私保护方面的评估缺口，并证明其仍存在潜在的隐私泄露风险。

Comments Accepted at the 1st Workshop on Multi-Sensor Trajectory Knowledge Discovery and Extraction (MuseKDE 2026), co-located with the 27th IEEE International Conference on Mobile Data Management (IEEE MDM 2026)

2605.15243 2026-05-18 cs.LG cs.AI q-bio.BM q-bio.MN q-bio.QM

Reading the Cell, Designing the Cure: Perturbation-Conditioned Molecular Diffusion for Function-Oriented Drug Design

Ziyu Xu, Zijian Zhang, Liang Wang, Zhiyuan Liu, Qiang Liu, Shu Wu, Liang Wang

AI总结该研究提出了一种基于转录组的药物设计方法（TBDD），旨在根据期望的基因表达变化生成具有特定功能的分子。为了解决生物学与化学领域间的巨大差异以及转录组信号稀疏性带来的挑战，研究设计了多尺度的扩散生成模型CURE，其核心模块TFE能够提取功能导向的扰动特征，并跨模态对齐化学结构信息，从而生成结构合理且功能一致的候选药物分子。实验表明，该方法在多个基准测试中表现优异，并在零样本基因抑制剂设计任务中验证了其实际应用潜力。

2605.15242 2026-05-18 cs.LG

Logical Grammar Induction via Graph Kolmogorov Complexity: A Neuro-Symbolic Framework for Self-Healing Clinical Data Integrity

Abolfazl Zarghani, Amir Malekesfandiari

AI总结本文提出了一种名为Logic-GNN的神经符号框架，用于解决医疗信息系统中由人为错误引起的临床数据完整性问题。该方法将临床记录视为受潜在逻辑规则支配的结构化“私有语言”，结合时序图神经网络与图 Kolmogorov 复杂度，推导出描述医疗交互逻辑的符号语法规则，并将异常定义为违反该语法导致图描述长度显著增加的情况。实验表明，该方法在区分医疗异常与数据错误方面表现出色，F1 分数达到 0.94，优于现有方法，并具备实时自我修复功能以维护数据完整性。

2605.15235 2026-05-18 cs.LG

MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion

Wugeng Zheng, Ziwen Kan, Tianlong Chen, Chen Chen, Song Wang

AI总结 MuteBench 是一个用于评估不完整多模态融合系统在模态缺失情况下的鲁棒性的基准，涵盖了7个临床领域的9个数据集、6种融合架构和两种缺失数据模式。研究发现，架构类型是影响系统鲁棒性的最主要因素，而通道独立模型在处理模态缺失时表现较好，但在处理模态内缺失时可能存在问题。该基准为临床AI系统的设计与选择提供了重要参考。

2605.15231 2026-05-18 cs.LG cs.CV

Mask-Morph Graph U-Net: A Generalisable Mesh-Based Surrogate for Crashworthiness Field Prediction under Large Geometric Variation

Haoran Li, Tobias Lehrer, Yingxue Zhao, Haosu Zhou, Philipp Stocker, Tobias Pfaff, Nan Li

AI总结该研究提出了一种名为Mask-Morph Graph U-Net（MMGUNet）的新型图神经网络模型，旨在解决在几何变化较大的情况下，基于图神经网络的碰撞安全性场预测模型泛化能力不足的问题。该方法通过特征对齐的重心参数化技术对粗化图结构进行形态变换，以保持空间对应关系，同时结合节点掩码预训练和参数高效的微调策略，提升模型在不同输入网格上的预测精度和数据效率。实验表明，该模型在多种测试场景下均优于现有方法，为碰撞安全性设计的高效仿真提供了可行的替代方案。

Comments 48 pages, 15 figures, jounral paper to be submitted

详情

英文摘要

Nonlinear finite element crash simulations are accurate but computationally expensive, limiting their use in iterative design optimisation. Machine-learning surrogate models based on graph neural networks (GNNs) offer a faster alternative. Message-passing GNNs are widely used for mesh simulation, and their shared node and edge update functions are relatively generalisable across varying graph structures. By contrast, non-shareable edge-specific aggregation layers can capture nonlinear relationships more accurately but usually require fixed graph connectivity, which limits generalisability. This paper presents Mask-Morph Graph U-Net (MMGUNet), a practical approach to addressing the limitation of hierarchical Graph U-Net architectures that use edge-specific downsampling and upsampling layers. Fixed coarse graph connectivity is required for edge-specific layers. To retain this while improving spatial correspondence, the proposed method morphs the coarsened graph hierarchy to each input mesh using feature-aligned barycentric parameterisation before constructing cross-graph edges. It further applies node masking during supervised pretraining, followed by parameter-efficient fine-tuning in which high-parameter edge-specific layers are frozen. The proposed approach is evaluated in in-distribution, out-of-distribution, and cross-component transfer settings using mean Euclidean distance and maximum intrusion percentage error. Results show that coarse-graph morphing improves test accuracy relative to a fixed-coarse-graph baseline, while masked supervised pretraining reduces the train-test discrepancy and improves data efficiency during transfer. The proposed model also achieves lower prediction error compared with external baselines. These results demonstrate a practical route toward reusable, data-efficient mesh-based surrogate modelling for crashworthiness design exploration.

URL PDF HTML ☆

赞 0 踩 0

2605.15228 2026-05-18 cs.AI cs.LG

Verifiable Agentic Infrastructure: Proof-Derived Authorization for Sovereign AI Systems

Jun He, Deying Yu

AI总结本文研究了主权AI系统中自主智能体执行操作时的授权验证问题，提出了一种基于可信证明的分布式授权框架（DTF）。该框架通过结构化、可验证的证明对象来动态生成执行权限，确保所有高风险操作都必须基于共识验证的证明，并与证据链绑定，从而实现对智能体行为的可控、可审计和可追溯。该方法为云原生环境中的自主AI系统提供了安全、去中心化的授权基础设施。

Comments 19 pager, 2 figures, 4 tables

详情

英文摘要

Modern cloud and enterprise systems rely on identity-centric authorization, assuming that callers possessing valid credentials are safe to execute commands. The emergence of autonomous AI agents invalidates this assumption: agents can generate syntactically valid but semantically unsafe actions, making standing privileges a significant operational risk. This risk becomes especially acute in sovereign AI systems, where autonomous agents may interact with cloud infrastructure, regulated data, financial workflows, and national-scale digital services. Governed mutation substrates reduce this risk by interposing on agent actions: agents submit intents, infrastructure evaluates context and policy, and execution is mediated. However, this shifts the trust boundary: how can the decision to authorize an intent be made verifiable, distributed, and replayable? We introduce a Distributed Trust Framework (DTF), a verification framework for governed mutation systems that computes execution authority from structured, verifiable artifacts. DTF introduces a Justification Proof to encode the admissibility basis of an action, a consensus model for independent evaluation, an ephemeral Execution Identity derived from the approved proof, and an append-only Evidence Chain that preserves the authorization lifecycle. Under stated substrate assumptions, this architecture enforces a compact authorization invariant: no high-stakes execution without a proof object, no derived authority without consensus, and no valid mutation detached from evidence. We define the model, instantiate it over an OpenKedge-based governed mutation substrate, and show how it maps onto cloud-native environments. By shifting authorization from standing identity to proof-derived authority, DTF provides an infrastructure foundation for making agentic execution governable, auditable, and bounded in sovereign AI deployments.

URL PDF HTML ☆

赞 0 踩 0

2605.15227 2026-05-18 cs.AI cond-mat.mtrl-sci cs.RO

NIMO Controller: a self-driving laboratory orchestrator based on the Model Context Protocol

Naruki Yoshikawa, Ryo Tamura

AI总结本文提出了一种基于模型上下文协议（MCP）的自主驾驶实验室（SDL）控制架构——NIMO Controller，旨在解决现有SDL软件框架缺乏标准化接口、难以支持AI代理的问题。该架构通过MCP服务器统一暴露所有SDL功能，并提供了基于MCP工具发现的可视化编程接口，使用户无需编写代码即可设计实验流程，同时支持AI代理通过同一后端进行交互。研究通过颜色匹配实验验证了该架构的可行性与实用性。

Comments 9 pages, 4 figures

2605.15224 2026-05-18 cs.AI cs.MA

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

Jianbo Lin, Xiaomin Yu, Yi Xin, Yifu Guo, Zhuosong Jiang, Zhongqi Yue, Weishi Wang, Heqing Zou, Chengwei Qin, Hui Xiong

AI总结本文提出了一种基于强化学习的新型框架ICRL，旨在使大型语言模型在获得自我批评反馈后能够内化这些指导，从而在无外部批评的情况下仍能保持良好的表现。该框架通过联合训练求解器和批评者，利用批评反馈带来的性能提升作为奖励，促使批评者生成更有助于改进的反馈。为了解决批评条件行为与无批评行为之间的分布偏移问题，ICRL引入了分布校准的重加权策略，并通过角色分组优势估计稳定联合优化过程。实验表明，ICRL在多种任务中均取得了显著提升，且训练出的批评者在性能上可与更大规模的模型相媲美。

2605.15220 2026-05-18 cs.CL cs.AI cs.LG

Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

Michael Y. Hu, Apurva Gandhi, Kyunghyun Cho, Tal Linzen, Pratyusha Sharma

AI总结数据混合在语言模型训练中起着关键作用，决定了如何组合不同来源或类型的训练数据。本文提出了一种名为OP-Mix的高效数据混合算法，能够在整个语言模型训练生命周期中持续运行，解决了现有方法仅适用于单一训练阶段的问题。该方法通过在当前模型上训练低秩适配器并进行插值，低成本地模拟候选数据混合方案，从而避免了对代理模型的依赖，并始终基于模型的实际学习动态进行搜索。实验表明，OP-Mix在预训练、持续微调等任务中均能以更低的计算成本达到接近最优的性能。

2605.15218 2026-05-18 cs.AI cs.CE

CAX-Agent: A Lightweight Agent Harness for Reliable APDL Automation

Chenying Lin, Yichen Hai, Yi He, Ran Wang, Haiyan Qiang, Liang Yu

AI总结本文提出了一种轻量级的代理框架CAX-Agent，旨在提升MAPDL有限元仿真中的自动化可靠性。该框架通过引入领域特定的中间件，实现工具生命周期管理、工作流状态控制和故障恢复，从而解决大语言模型在该任务中常见的输出不一致和任务失败问题。实验评估表明，CAX-Agent中基于模型驱动的恢复策略在多个结构基准测试中表现出色，显著优于仅依赖规则或无恢复策略的方法。

Comments 8 pages, 6 figures, IEEE conference format

2605.15217 2026-05-18 cs.AI cs.CY cs.LG econ.GN q-fin.EC

Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions

Jagdish Tripathy, Marcus Buckmann

AI总结本研究探讨了指令微调语言模型在高风险决策（如房贷审批）中表现出的行为公平性与其内部潜在偏见之间的不对称关系。研究发现，尽管模型在输出层面看似无偏，但其内部表示仍保留并放大了与种族相关的偏见，且这些隐藏的偏见具有因果影响力，能够通过特定干预引发决策反转。研究还揭示了这种偏见在不同群体间的不对称性，并指出仅关注输出的行为审计不足以识别和治理模型中的潜在偏差，需结合表示分析的双重评估框架。

Comments 39 pages, 16 figures, 2 tables

2605.15215 2026-05-18 cs.AI cs.SE

SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces

Duling Xu, Zheng Chen, Zaifeng Pan, Jiawei Guan, Dong Dong, Jialin Li, Bangzheng Pu

AI总结 SkillSmith 是一种边界引导的编译-运行时框架，旨在优化基于技能的智能体系统。该方法通过离线编译技能包为最小可执行接口，提取技能的细粒度操作边界，使智能体在运行时仅调用相关组件，从而减少冗余上下文注入和重复推理。实验表明，SkillSmith 显著降低了推理阶段的 token 使用量、思考迭代次数和执行时间，并提升了任务准确率，同时支持强模型生成的编译结果被轻量模型复用。

2605.15208 2026-05-18 cs.LG cs.AI

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels

Plawan Kumar Rath, Rahul Maliakkal

AI总结该研究探讨了量化压缩对大型语言模型（LLMs）偏见表现的影响，发现低精度量化会导致模型在多个任务中产生新的刻板印象行为，且这种变化与精度水平呈剂量反应关系。通过在多个模型和精度级别上的大规模实验，研究揭示了传统质量评估指标无法检测到这种偏见的增加，强调了在模型压缩前进行公平性检测的重要性。

Comments 7 pages, 4 figures, 4 tables. Accepted at IEEE Cloud Summit 2026. This is the author's accepted version; the version of record will appear in IEEE Xplore

2605.15207 2026-05-18 cs.LG cs.MA

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

Yi Xie, Siao Liu, Falong Fan, Yuanqi Yao, Yue Zhao, Bo Liu

AI总结多智能体大语言模型系统在复杂推理任务中展现出潜力，但近期评估显示其性能常低于单一模型基线。本文识别出共享上下文团队在顺序微调中存在结构性失效模式，即更新一个智能体会导致团队上下文分布偏移，而后续使用缓存轨迹进行评估会加剧这种偏差。为此，作者提出TeamTR信任域框架，通过每次更新后重新采样轨迹并控制每个智能体的分布偏差，从而保证每次更新和每个阶段的改进下界。实验表明，TeamTR在平均性能上优于单智能体和顺序微调方法约7.1%，有效缓解了协调退化问题，并支持组件的即插即用替换。

Comments 9pages, Accepted at ICML2026