arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.26886 2026-05-27 cs.DS cs.LG

Parsimonious Learning-Augmented Online Metric Matching

简约学习增强的在线度量匹配

Yongho Shin, Phanu Vajanopath

AI总结针对在线度量匹配问题，提出一种简约学习增强算法，通过虚拟预测填补缺失预测，并建立性能下界，实验验证了其有效性。

详情

Comments: To appear in ICML 2026

AI中文摘要

近年来，学习增强算法受到了广泛关注，尤其是在在线优化领域。由于生成预测的高计算成本，越来越多的研究关注于学习增强算法中性能保证与预测使用数量之间的权衡，例如缓存和度量任务系统问题。在本文中，我们将这一研究方向扩展到在线度量匹配，开发了简约学习增强算法并建立了其性能下界。我们的方法将“跟随预测”框架扩展到简约设置，通过在缺乏实际预测时使用一种在线度量匹配算法来填充虚拟预测，该算法在执行过程中保持良好中间匹配。我们通过实证评估补充了理论结果，证明了我们方法的实际有效性。

英文摘要

Learning-augmented algorithms have received significant attention in recent years, particularly in the context of online optimization. Motivated by the high computational cost of generating predictions, a growing line of work studies the tradeoff between performance guarantees and the number of predictions used in learning-augmented algorithms for problems such as caching and metrical task systems. In this paper, we extend this line of research to online metric matching by developing parsimonious learning-augmented algorithms and establishing lower bounds on their performance. Our approach extends the Follow-the-Prediction framework to the parsimonious setting by filling in a virtual prediction in the absence of an actual prediction, using an online metric matching algorithm that maintains good intermediate matchings throughout its execution. We complement our theoretical results with an empirical evaluation, demonstrating the practical effectiveness of our approach.

URL PDF HTML ☆

赞 0 踩 0

2605.26879 2026-05-27 cs.CV

Natural Human Motion Recovery by Aligning High-Order Temporal Dynamics from Monocular Videos

通过对齐单目视频中的高阶时间动态恢复自然人体运动

Dingkun Wei, Zehong Shen, Yan Xia, Georgios Pavlakos, Yujun Shen, Xiaowei Zhou

AI总结提出HTD-Refine框架，利用PVA-Net估计的高阶时间动态（速度和加速度）优化全局轨迹，恢复自然人体运动。

详情

Comments: 13 pages, 6 figures. Accepted as an Oral presentation and Best Paper Candidate at CVPR 2026. Project page: https://zju3dv.github.io/htd-refine/

AI中文摘要

从单目视频中恢复的人体运动通常显得过于平滑或动态不一致，即使关节位置在数值上是准确的。我们观察到，这种局限性源于缺乏可靠的高阶时间线索——速度和加速度——这些对于重建具有真实动量、时序和高频细节的运动至关重要。我们引入了HTD-Refine，一个后处理框架，通过显式估计的高阶时间动态来增强现有的人体运动恢复（HMR）流程。我们系统的核心是PVA-Net，一个时间变换器，它直接从单目视频推断每个关节的2D位置、3D速度和3D加速度。这些预测的动态作为全局优化过程中的软约束，优化世界空间轨迹，显著减少抖动、抑制过度平滑，并恢复物理上合理的运动。在具有挑战性的野外基准上的大量实验表明，HTD-Refine持续改进了最先进的HMR方法，产生了更准确的全局轨迹和更自然的运动动态。我们的结果强调了高阶时间建模在推进单目人体运动恢复中的关键作用。

英文摘要

Human motion recovered from monocular videos often appears overly smooth or dynamically inconsistent, even when joint positions are numerically accurate. We observe that this limitation stems from the absence of reliable high-order temporal cues -- velocity and acceleration -- which are essential for reconstructing motion that exhibits realistic momentum, timing, and high-frequency detail. We introduce HTD-Refine, a post-processing framework that augments existing Human Motion Recovery (HMR) pipelines using explicitly estimated high-order temporal dynamics. At the core of our system is PVA-Net, a temporal transformer that infers per-joint 2D positions, 3D velocities, and 3D accelerations directly from a monocular video. These predicted dynamics serve as soft yet informative constraints in a global optimization procedure that refines world-space trajectories, significantly reducing jitter, suppressing over-smoothing, and restoring physically plausible motion. Extensive experiments on challenging in-the-wild benchmarks show that HTD-Refine consistently improves state-of-the-art HMR methods, yielding more accurate global trajectories and substantially more natural motion dynamics. Our results highlight the critical role of high-order temporal modeling in advancing monocular human motion recovery.

URL PDF HTML ☆

赞 0 踩 0

2605.26878 2026-05-27 cs.AI

Multi-Stakeholder LLM Alignment: Decomposing Estimation from Aggregation

多方利益相关者LLM对齐：从聚合中分解估计

Lulu Zheng, Wenjin Yang, Xiangwen Zhang, Rong Yin, Yulan Hu, Zheng Pan, Xin Li

AI总结针对多方利益相关者任务中用户偏好冲突的问题，提出DecompR方法，通过反事实校准权重固定查询结构，独立估计角色效用，消除候选依赖的权重漂移并降低估计噪声。

2605.26870 2026-05-27 cs.MA cs.AI cs.HC

Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study

学术研究中的持久性AI智能体：单研究者实施案例研究

Anas H. Alzahrani

AI总结通过单研究者案例研究，分析了持久性AI智能体在真实学术环境中的架构、使用、产出和治理，发现缓存主导的工作流可能将经济单位从每token成本转向每完成工件成本。

详情

Comments: 19 pages, 2 figures, 3 main tables; supplementary appendix with 6 tables, 2 figures, and a reproducibility methods section. Describes 17 configured agents in a persistent research environment and introduces the PARE-M (Persistent Agentic Research Environment Measurement) framework

AI中文摘要

背景：大型语言模型通常作为模型、基准或简短对话片段进行评估。当智能体持久嵌入真实学术研究环境，具有持久记忆、本地文件、外部工具、计划例程、委派角色和明确安全协议时，会发生什么知之甚少。方法：从2026年1月31日至5月25日进行了一项结构化自我观察的实施案例研究。分析单元是持久的人-智能体环境：研究者、智能体运行时、记忆层、工具、仓库、计划任务、专门智能体角色和治理规则。结果使用PARE-M（持久智能体研究环境测量）组织，这是一个涵盖架构、利用、工件生产、资源使用、可重复性和治理的测量框架。结果：可恢复的主智能体遥测包含96个活跃日中的75,671条去重记录，其中8,059条用户角色消息和23,710条助手角色消息。工作空间包括502个记忆相关文件、17个配置的智能体目录和57个技能文件。活跃系统时间为579.7小时（30分钟上限间隙估计）。记忆衍生记录识别出482个输出代理事件和889个失败、验证、纠正或协议代理事件。一个严格的2026年5月轨迹子集捕获了627个模型完成事件和73.95百万记录token，其中82.9%为缓存读取。结论：工作流以缓存为主导，表明持久智能体环境可能将经济单位从每token成本转向每完成工件成本。未来评估应使用工件级分母、可重复解析规则、纠正分类法和治理事件的独立编码。

英文摘要

Background: Large language models are typically evaluated as models, benchmarks, or short conversational episodes. Less is known about what happens when an agent is embedded persistently in a real academic research environment with durable memory, local files, external tools, scheduled routines, delegated roles, and explicit safety protocols. Methods: A structured self-observed implementation case study was conducted from January 31 to May 25, 2026. The unit of analysis was the persistent human-agent environment: researcher, agent runtime, memory layer, tools, repositories, scheduled jobs, specialized agent roles, and governance rules. Outcomes were organized using PARE-M (Persistent Agentic Research Environment Measurement), a measurement framework covering architecture, utilization, artifact production, resource use, reproducibility, and governance. Results: Recoverable main-agent telemetry contained 75,671 de-duplicated records across 96 active days, with 8,059 user-role and 23,710 assistant-role messages. The workspace included 502 memory-related files, 17 configured agent directories, and 57 skill files. Active system time was 579.7 hours (30-minute capped-gap estimate). Memory-derived records identified 482 output-proxy events and 889 failure, verification, correction, or protocol-proxy events. A strict May 2026 trajectory subset captured 627 model-completed events and 73.95 million recorded tokens, of which 82.9% were cache reads. Conclusions: The workflow was cache-dominant, suggesting that persistent agentic environments may shift the economic unit from cost per token to cost per completed artifact. Future evaluations should use artifact-level denominators, reproducible parsing rules, correction taxonomies, and independent coding of governance events.

URL PDF HTML ☆

赞 0 踩 0

2605.26862 2026-05-27 cs.CV

RoadGIE: Towards A Global-Scale Aerial Benchmark for Generalizable Interactive Road Extraction

RoadGIE：面向通用交互式道路提取的全球尺度航拍基准

Chenxu Peng, Chenxu Wang, Yimian Dai, Yongxiang Liu, Ming-Ming Cheng, Xiang Li

AI总结提出最大、最多样的道路分割数据集WorldRoadSeg-360K，并设计支持连通性感知提示的交互式方法RoadGIE，在分割精度和拓扑一致性上达到最优。

详情

AI中文摘要

从航拍图像中准确分割道路是许多地理空间应用的基础。然而，现有数据集通常面临场景多样性有限、语义粒度低和结构连续性差的问题，限制了它们在不同环境中的泛化能力。为了解决这些挑战，我们引入了WorldRoadSeg-360K，这是迄今为止最大、最多样的道路分割数据集，包含从38个国家223个城市收集的366,947张高分辨率图像，覆盖不同地形和大陆。WorldRoadSeg-360K作为一个全面的基准，揭示了处理多样化和结构复杂场景的关键挑战。自动化方法通常难以保持道路连通性，而当前的交互式方法缺乏高效、拓扑敏感的工具用于实际道路编辑。为此，我们提出了RoadGIE，建立了一种新的遥感道路提取交互范式。与先前的点或框提示策略不同，RoadGIE支持连通性感知提示，包括点击和涂鸦，这些提示与道路网络的拓扑结构天然对齐。为了提高结构一致性并减轻迭代交互中的性能下降，RoadGIE集成了专家引导的提示策略，并针对交互场景调整了基于骨架的召回损失。RoadGIE在WorldRoadSeg-360K和其他基准上，在分割精度和拓扑一致性方面均达到了最先进的性能，同时仅需3.7M参数即可高效运行。代码公开于：https://github.com/chaineypung/RoadGIE

英文摘要

Accurate road segmentation from aerial imagery is fundamental to many geospatial applications. However, existing datasets often suffer from limited scene diversity, low semantic granularity, and poor structural continuity, restricting their generalization across environments. To address these challenges, we introduce WorldRoadSeg-360K, the largest and most diverse road segmentation dataset to date, comprising 366,947 high-resolution images collected from 38 countries and 223 cities across various terrains and continents. WorldRoadSeg-360K serves as a comprehensive benchmark and reveals key challenges in handling diverse and structurally complex scenes. Automated approaches often struggle to preserve road connectivity, while current interactive methods lack efficient, topology-sensitive tools for real-world road editing. To this end, we present RoadGIE, establishing a novel interactive paradigm for road extraction in remote sensing. Unlike prior point- or box-based prompting strategies, RoadGIE supports connectivity-aware prompts, including clicks and scribbles, which inherently align with the topology of road networks. To improve structural consistency and mitigate performance degradation during iterative interactions, RoadGIE integrates an expert-guided prompting strategy and adapts the skeleton-based recall loss for interactive scenarios. RoadGIE achieves state-of-the-art performance in both segmentation accuracy and topological consistency on WorldRoadSeg-360K and other benchmarks, while maintaining efficient operation with only 3.7M parameters. The code are publicly available at: https://github.com/chaineypung/RoadGIE

URL PDF HTML ☆

赞 0 踩 0

2605.26861 2026-05-27 cs.CV

REVERSE: Reinforcing Evidence Verification and Search for Agentic Image geo-localization

REVERSE: 强化证据验证与搜索的智能体图像地理定位

Yong Li, Furong Jia, Dacheng Yin, Kang Rong, Fengyun Rao, Jing Lyu, Fan Zhang

AI总结提出REVERSE框架，通过多轮智能体推理强化证据搜索与验证的交互，在图像地理定位任务中优于强检索增强基线，以4B模型媲美更大模型。

详情

AI中文摘要

图像地理定位旨在确定照片的拍摄地点，该任务通常需要识别可见地标之外的信息。人类专家通常通过迭代工作流程解决：检查信息区域，形成位置假设，寻求外部证据，并根据新线索修正判断。现有方法仅部分捕捉这一过程：直接预测方法完全绕过证据获取，而检索增强方法引入外部证据但通常对中间决策（搜索位置、查询方式、过滤噪声结果）提供有限监督。我们提出REVERSE，一个强化证据搜索与验证交互的框架，实现多轮智能体推理。REVERSE教授三个中间决策：看哪里、查什么、信任什么证据。为此，我们构建了带注释区域选择、搜索观察和地理信息证据标签的工具化轨迹，并引入视觉定位、查询效用和证据辨别的过程奖励。离线搜索缓存使检索观察在强化学习过程中稳定且可重用，实现对噪声搜索结果的密集监督。使用4B模型，REVERSE在Im2GPS3k和YFCC4k上优于强检索增强基线，并媲美显著更大的模型。代码见https://github.com/yonglleee/REVERSE。

英文摘要

Image geo-localization aims to determine where a photograph was taken, a task that often requires more than recognizing visible landmarks. Human experts typically solve it through an iterative workflow: they inspect informative regions, form location hypotheses, seek external evidence, and revise their judgments as new clues appear. Existing methods only partially capture this process: direct prediction methods bypass evidence acquisition altogether, while retrieval-augmented methods introduce external evidence but usually provide limited supervision on the intermediate decisions of where to search, how to query, and how to filter noisy results. We present REVERSE, a framework that reinforces the interplay between evidence search and verification to enable multi-turn agentic reasoning. REVERSE teaches three intermediate decisions: where to look, what to query, and what evidence to trust. To support this, we construct tool-grounded trajectories with annotated region selections, search observations, and geo-informative evidence labels, and introduce process rewards for visual grounding, query utility, and evidence discrimination. An offline search cache makes retrieval observations stable and reusable during reinforcement learning, enabling dense supervision over noisy search results. With a 4B model, REVERSE outperforms strong retrieval-augmented baselines and rivals substantially larger models on Im2GPS3k and YFCC4k. Code is available at https://github.com/yonglleee/REVERSE.

URL PDF HTML ☆

赞 0 踩 0

2605.26857 2026-05-27 cs.LG

Generalist Graph Anomaly Detection via Prototype-Based Distillation

基于原型蒸馏的通才图异常检测

Yiming Xu, Zihan Chen, Zhen Peng, Song Wang, Bin Shi, Bo Dong, Chao Shen

AI总结提出首个无监督通才图异常检测框架ProMoS，通过知识蒸馏从冻结的自监督图神经网络教师模型中提取正常性先验，并利用原型引导的软标签蒸馏实现跨图零样本异常检测。

详情

Comments: Accepted by ICML 2026

AI中文摘要

在高风险领域对图异常检测（GAD）的迫切需求驱动下，通才GAD范式（训练一个可迁移到新图的单一检测器）近年来日益受到关注。然而，现有方法通常依赖稀缺且昂贵的标注进行训练，有时甚至需要在推理时提供少量样本支持，这限制了其对多样且未见异常模式的鲁棒性。为解决这一局限，我们提出了ProMoS，首个无监督通才GAD框架，通过建模未标注数据中丰富的正常性来检测异常。ProMoS采用知识蒸馏范式，将正常性先验从冻结的自监督图神经网络（GNN）教师模型蒸馏到具有共享全局和轻量个性化分支的混合学生模型中，无需从头学习即可实现高效且富有表现力的正常性建模。我们进一步提出原型引导的软标签蒸馏，在共享原型空间中对齐教师和学生，增强跨图泛化能力。在推理时，ProMoS通过蒸馏偏差和原型几何偏差对未见图进行零样本异常检测。大量实验证明了ProMoS的有效性和高效性，为迈向无标签、零样本的通才GAD开辟了一条实用路径。

英文摘要

Driven by the pressing demand for graph anomaly detection (GAD) in high-stakes domains, the generalist GAD paradigm, which trains a single detector transferable across new graphs, has recently gained growing attention. However, existing methods often rely on scarce and costly annotations for training and sometimes even require few-shot support at inference, which limits their robustness to diverse and unseen anomaly patterns. To address this limitation, we introduce ProMoS, the first unsupervised generalist GAD framework, which detects anomalies by modeling the abundant normality in unlabeled data. ProMoS adopts a knowledge-distillation paradigm to distill normality priors from a frozen self-supervised graph neural network (GNN) teacher to a mixture-of-students model with shared global and lightweight personalized branches, enabling efficient and expressive normality modeling without learning from scratch. We further propose prototype-guided soft-label distillation to align teacher and student in a shared prototype space, enhancing cross-graph generalizability. During inference, ProMoS performs zero-shot anomaly detection on unseen graphs via distillation bias and prototype geometric deviation. Extensive experiments show the effectiveness and efficiency of ProMoS, charting a practical path toward label-free, zero-shot generalist GAD.

URL PDF HTML ☆

赞 0 踩 0

2605.26856 2026-05-27 q-bio.NC cs.AI cs.RO

The Sensation Modulating Network:Haltability as the architectural ground for object-directed phenomenology

感觉调节网络：可停性作为对象导向现象学的架构基础

G. Nagarjuna, Durgaprasad Karnam

AI总结本文提出感觉调节网络（SMN）作为具身认知的架构，通过对手动力学和可停性机制，将对象导向现象学（胡塞尔意义）的意向性建立在身体组织的结构特征上，从而调和认知主义与4E认知的争论。

详情

Comments: 64 pages, main body 38 pages + References 6, Appendices 20 pages, Tables 3, and Figures 21

AI中文摘要

认知科学仍然分裂为认知主义——它解释了递归和语言，但无法将形式符号扎根于意义——和4E方法——它将认知扎根于身体，但很少详细说明身体的架构以支持生成性。我们认为这一僵局源于对具身代理架构的不完整描述，并提出一个架构：感觉调节网络（SMN），即认知代理被构想为整个身体，在每个解剖尺度上由对手动力学组织，由感觉调节器构建，这些调节器通过一个基底感知和行动，配对成协调动作区，由全身广播网络路由。三个承诺赋予了SMN其效力。可停性——将对抗性可供性招募到共激活平衡中——提供了对象导向现象学（在胡塞尔意义上）所需的架构位置：对手性使得共激活成为可能，共激活使得停止成为可能，停止使得注意成为可能，注意使得意向指向成为可能，而无需在顶层添加任何模块。可自我调节动作模式（SMAP）的双信号特性使得自我/世界区分成为布线的结构特征，而非代理应用的范畴。四级动作模式层级——基础、可停、可协商、交易——提供了从自主规律性到公共惯例化的单一轨迹，将基于语法的生成性条件定位为架构转变。SMN调和了认知主义与4E的争论：递归存在于可协商动作模式的可修改动力学中，具身性存在于支持它们的对手基底中。附录中给出了一个初步的形式化方法和八个预测寄存器（七个可测试，一个假设性），以及参考模拟。

英文摘要

Cognitive science remains split between cognitivism - which accounts for recursion and language but cannot ground formal symbols in meaning - and 4E approaches - which ground cognition in the body but rarely specify the body's architecture in enough detail to support generativity. We argue the impasse stems from an incomplete account of the embodied agent's architecture, and propose one: the Sensation Modulating Network (SMN), the cognitive agent conceived as the whole body, organized at every anatomical scale by opponent dynamics, built from Sensation Modulators that sense and act through one substrate, paired into Coordinated Action Zones routed by a body-wide broadcast network. Three commitments give the SMN its purchase. Haltability - the recruitment of antagonistic affordance into co-activated equilibrium - provides the architectural locus that object-directed phenomenology, in Husserl's sense, requires: opponency enables co-activation, co-activation enables halt, halt enables attention, attention enables intentional directedness, with no module added on top. The dual-signal property of self-modulatable action patterns (SMAPs) makes the self/world distinction a structural feature of the wiring rather than a category the agent applies. And a four-level action-pattern hierarchy - Basal, Haltable, Negotiable, Transactional - gives a single trajectory from autonomic regularity to public conventionalization, locating the conditions for grammar-grounded generativity as architectural transitions. The SMN reconciles the cognitivism-4E debate: recursion lives in the modifiable dynamics of Negotiable Action Patterns, embodiment in the opponent substrate that supports them. A tentative formalism and eight predicted registers (seven testable, one hypothetical), with reference simulations, are given in an appendix.

URL PDF HTML ☆

赞 0 踩 0

2605.26855 2026-05-27 cs.CV

Receipt Replay OOD: A Small Benchmark for Screen Replay Detection Under Domain Shift

Receipt Replay OOD: 一个用于域偏移下屏幕重放检测的小型基准

Alexander Vinogradov

AI总结针对屏幕重放攻击检测中的域偏移问题，提出基于收据的小型OOD基准，评估跨域泛化性能。

2605.26854 2026-05-27 cs.LG

RAPNet: Accelerating Algebraic Multigrid with Learned Sparse Corrections

RAPNet: 通过学习的稀疏校正加速代数多重网格

Yali Fink, Ido Ben-Yair, Lars Ruthotto, Eran Treister

AI总结提出图神经网络框架RAPNet，通过从稀疏代数系统中学习生成稀疏且鲁棒的粗网格算子，解决了代数多重网格中稀疏性与收敛质量之间的权衡问题，并采用逐层训练策略实现大规模泛化。

详情

Comments: Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea Code available at https://github.com/idoby/rapnet

AI中文摘要

大规模稀疏线性系统的可扩展求解是科学计算和图分析中的瓶颈。虽然代数多重网格提供了最优的线性扩展，但其性能受到粗网格算子稀疏性与收敛质量之间权衡的严重限制。经典的代数多重网格启发式方法难以平衡这些目标，常常为了稀疏性而牺牲稳定性或性能。我们提出了RAPNet，一个图神经网络框架，通过学习直接从稀疏代数系统生成稀疏、鲁棒的粗算子来解决这一权衡。我们方法的关键是一种逐层训练策略，该策略能够从小型子图中学习并泛化到百万节点规模的域，绕过了先前神经代数多重网格尝试的瓶颈。RAPNet仅在求解器设置阶段执行，确保求解阶段保持其有利的计算特性。我们展示了我们的方法在多种PDE离散化和图拉普拉斯矩阵上优于经典的非Galerkin基线，使其特别适用于多查询任务，如特征值问题、时间依赖模拟以及逆问题或设计问题。

英文摘要

The scalable solution of large sparse linear systems is a bottleneck in scientific computing and graph analysis. While algebraic multigrid (AMG) offers optimal linear scaling, its performance is severely constrained by the trade-off between the sparsity and convergence quality of coarse-grid operators. Classical AMG heuristics struggle to balance these objectives, often sacrificing stability or performance for sparsity. We propose RAPNet, a graph neural network (GNN) framework that resolves this trade-off by learning to generate sparse, robust coarse operators directly from the sparse algebraic system. Key to our approach is a level-wise training strategy that enables learning from small subgraphs and generalization to million-node domains, bypassing the bottlenecks of prior neural AMG attempts. RAPNet executes exclusively during the solver setup phase, ensuring that the solve phase retains its favorable computational properties. We show that our method outperforms classical non-Galerkin baselines on diverse PDE discretizations and graph Laplacians, making it particularly effective for multi-query tasks such as eigenproblems, time-dependent simulations, and inverse or design problems.

URL PDF HTML ☆

赞 0 踩 0

2605.26850 2026-05-27 cs.LG

Learning Energy-Based Models from Stochastic Interpolants using Spatiotemporal Differences

从随机插值中学习基于能量的模型：利用时空差异

Hanlin Yu, RuiKang OuYang, Partha Kaushik, Arto Klami, Michael U. Gutmann, Omar Chehab

AI总结提出时空噪声对比估计（stNCE）框架，通过联合时空差异从随机插值中学习能量函数，统一现有方法并实现与最先进密度估计方法竞争的性能。

2605.26849 2026-05-27 cs.CL

Uncertainty-Aware Budget Allocation for Adaptive Test-Time Reasoning

不确定性感知的自适应测试时推理预算分配

Manh Nguyen, Sunil Gupta, Hung Le

AI总结提出不确定性感知预算分配（UAB）框架，通过基于每问题不确定性的凹整数优化重新分配固定采样预算，无需额外推理成本，在多个推理基准上提升准确率高达3-5%。

详情

AI中文摘要

采样多个响应可以改善语言模型的推理能力，但均匀的计算分配效率低下：简单问题被过度采样，而困难问题探索不足。我们提出不确定性感知预算分配（UAB），这是一个凹整数优化框架，基于每问题的不确定性重新分配固定采样预算，且无需额外推理成本。在第一阶段，每个问题生成一个响应；其平均负对数似然（ANLL）直接从输出对数概率中提取，作为难度信号，同时该生成贡献于最终投票。在第二阶段，剩余预算通过边际贪心算法分配，该算法精确求解凹覆盖最大化替代问题：不确定的问题获得更多采样预算，而确定的问题获得更少的额外样本。在六个开源和黑盒模型（参数规模从1.5B到27B）以及五个涵盖数学、逻辑和偏好任务的推理基准上评估，UAB在平均准确率上比基线高出最多3%，在单个基准上高出最多5%，在低资源设置下增益最大，且无需辅助模型或额外的LLM调用。代码公开于 https://github.com/manhitv/UAB。

英文摘要

Sampling multiple responses improves language model reasoning, but uniform compute allocation is inefficient: easy questions are over-sampled while hard questions remain under-explored. We propose Uncertainty-Aware Budget Allocation (UAB), a concave integer optimization framework that reallocates a fixed sampling budget based on per-question uncertainty estimated at no additional inference cost. In Phase 1, every question receives one generation; its average negative log-likelihood (ANLL), extracted directly from output log-probabilities, serves as a difficulty signal while the generation contributes to the final vote. In Phase 2, the remaining budget is allocated by a marginal-greedy algorithm that solves a concave coverage-maximization surrogate exactly: uncertain questions receive more sampling budget while confident questions receive fewer additional samples. Evaluated on six open-weight and black-box models spanning 1.5B to 27B parameters and five reasoning benchmarks covering math, logic, and preference tasks, UAB outperforms baselines by up to +3% in average accuracy and up to +5% on individual benchmarks, with the largest gains in low-resource settings, requiring no auxiliary model or additional LLM call. Code is publicly available at https://github.com/manhitv/UAB.

URL PDF HTML ☆

赞 0 踩 0

2605.26844 2026-05-27 cs.LG

Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation

并非所有分歧都是可学习的：在线策略蒸馏中的Token可教学性

Yuanyi Wang, Su Lu, Yanggan Gu, Pengkai Wang, Yifan Yang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Hongxia Yang

AI总结本文提出可教学性感知的在线策略蒸馏（TA-OPD），通过识别并选择教师信号中可学习的token位置，仅用5%的token即可超越全token蒸馏效果。

详情

AI中文摘要

在线策略蒸馏（OPD）使用token级别的教师监督在学生的自身轨迹上训练学生。最近的OPD选择性方法通过优先考虑高熵或高分歧token来利用OPD信号的非均匀性。我们重新审视这一原则并问：哪些token级别的教师信号实际上是可学习的？使用固定上下文诊断（测量相同上下文下教师-学生KL散度减少），我们表明原始KL分歧是学习价值的粗略代理。它将可学习分歧（教师将纠正质量分配给学生的top-K候选）与不兼容分歧（教师将质量主要放在学生当前支持范围之外）混为一谈。我们将这种局部兼容性形式化为token可教学性，并表明它比单独的原始KL更好地预测固定上下文的改进。受此发现启发，我们提出可教学性感知的在线策略蒸馏（TA-OPD），一种轻量级的token位置选择方法，无需奖励模型或验证器即可将OPD损失应用于高可教学性位置。在Qwen2.5和Qwen 3教师-学生设置中，TA-OPD通常仅用5%的保留token就超越了全token OPD，并优于基于熵和散度的基线。我们的结果将选择性OPD重新定义为选择可学习的教师信号，而不仅仅是选择显著的token。

英文摘要

On-policy distillation (OPD) trains a student on its own rollouts with token-level teacher supervision. Recent selective OPD methods exploit the non-uniformity of OPD signals by prioritizing high-entropy or high-disagreement tokens. We revisit this principle and ask: which token-level teacher signals are actually learnable? Using a fixed-context diagnostic that measures same-context teacher-student KL reduction, we show that raw KL disagreement is a coarse proxy for learning value. It conflates learnable disagreement, where the teacher assigns corrective mass to the student's top-K candidates, with incompatible disagreement, where the teacher places mass mostly off the student's current support. We formalize this local compatibility as token teachability and show that it better predicts fixed-context improvement than raw KL alone. Motivated by this finding, we propose Teachability-Aware OPD (TA-OPD), a lightweight token-position selection method that applies OPD loss to high-teachability positions without reward models or verifiers. Across Qwen2.5 and Qwen 3 teacher-student settings, TA-OPD often surpasses full-token OPD with only 5% retained tokens and improves over entropy- and divergence-based baselines. Our results reframe selective OPD as selecting learnable teacher signals rather than merely salient tokens.

URL PDF HTML ☆

赞 0 踩 0

2605.26842 2026-05-27 cs.LG cs.CL

MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training

MONA: 基于Nesterov加速的Muon优化器用于可扩展语言模型训练

Jiacheng Li, Jianchao Tan, Hongtao Xu, Jiaqi Zhang, Yifan Lu, Yerui Sun, Yuchen Xie, Xunliang Cai

AI总结提出MONA优化器，通过将Nesterov加速项集成到Muon的梯度处理流程中，实现曲率感知加速，从而帮助逃离尖锐局部最小值，并在1B到68B参数的混合专家预训练中取得更优收敛和下游任务性能。

详情

AI中文摘要

Muon优化器最近为大型语言模型训练提供了一种有希望的AdamW替代方案，利用矩阵正交化产生几何感知更新。然而，与所有一阶方法一样，Muon可能会陷入尖锐的局部最小值。在这项工作中，我们提出了MONA，一种将Muon的正交化框架与曲率感知加速相结合的优化器。MONA直接将加速项添加到Muon的梯度处理流程中。该加速项根据梯度差异的指数移动平均计算得出。我们提供了MONA的详细收敛性分析，表明加速项能够在保持Muon谱范数正则化的同时逃离尖锐最小值。实验上，在从1B到68B参数的三个规模的混合专家预训练中（最大模型在1万亿tokens上训练），MONA在收敛性和下游任务性能上均优于Muon和AdamW。此外，我们在MOE-68B-A3B模型上进行了监督微调，并在通用能力、数学推理和代码生成基准上评估，MONA达到了最先进的性能。

英文摘要

The Muon optimizer has recently offered a promising alternative to AdamW for large language model training, leveraging matrix orthogonalization to produce geometry-aware updates. However, like all first-order methods, Muon can become trapped in sharp local minima. In this work, we present MONA, an optimizer that bridges Muon's orthogonalization framework with curvature-aware acceleration. MONA adds an acceleration term directly into Muon's gradient processing pipeline. This term is calculated from the exponential moving average of gradient differences. We provide a detailed convergence analysis for MONA, showing that the acceleration term enables escape from sharp minima while preserving Muon's spectral-norm regularization. Empirically, MONA achieves better convergence and downstream task performance compared to both Muon and AdamW across three scales of Mixture-of-Experts pretraining, spanning from 1B to 68B parameters, with the largest model trained on 1 trillion tokens. Furthermore, we conduct supervised fine-tuning on the MOE-68B-A3B model and evaluate it on general capability, mathematical reasoning, and code generation benchmarks, where MONA achieves SOTA performance.

URL PDF HTML ☆

赞 0 踩 0

2605.26840 2026-05-27 cs.CL

Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics

通过来自多个不完美指标的偏好学习优化摘要的事实一致性

Yuxuan Ye, Raul Santos-Rodriguez, Edwin Simpson

AI总结提出一种自动化训练流程，通过聚合多个弱事实性指标的分数并映射为偏好，过滤高分歧样本，利用词汇相似摘要对进行偏好学习，从而提升摘要的事实一致性。

详情

DOI: 10.18653/v1/2025.findings-emnlp.940
Comments: EMNLP 2025 Findings

AI中文摘要

使用评估指标作为奖励的强化学习被广泛用于增强语言模型的特定能力。然而，对于事实一致性摘要等任务，现有指标仍不完善，限制了其作为塑造模型行为的信号的有效性。虽然单个事实性指标不可靠，但它们的组合可以更有效地捕捉多样的事实错误。我们利用这一见解，引入了一种自动化训练流程，通过聚合来自不同弱指标的分数来提高摘要的事实一致性。我们的方法通过将分数映射到偏好并过滤掉指标之间高度不一致的情况，避免了复杂的奖励塑造。对于每个源文档，我们通过改变解码策略生成词汇相似的摘要对，使模型能够从由细微词汇差异引起的事实差异中学习。这种方法仅使用源文档构建高质量偏好数据集。实验表明，从早期的编码器-解码器架构到现代大型语言模型，模型均获得一致的事实性提升，较小的模型能达到与较大模型相当的事实性。

英文摘要

Reinforcement learning with evaluation metrics as rewards is widely used to enhance specific capabilities of language models. However, for tasks such as factually consistent summarisation, existing metrics remain underdeveloped, limiting their effectiveness as signals for shaping model behaviour.While individual factuality metrics are unreliable, their combination can more effectively capture diverse factual errors. We leverage this insight to introduce an automated training pipeline that improves factual consistency in summaries by aggregating scores from different weak metrics. Our approach avoids the need for complex reward shaping by mapping scores to preferences and filtering out cases with high disagreement between metrics. For each source document, we generate lexically similar summary pairs by varying decoding strategies, enabling the model to learn from factual differences caused by subtle lexical differences. This approach constructs a high-quality preference dataset using only source documents.Experiments demonstrate consistent factuality gains across models, ranging from early encoder-decoder architectures to modern large language models, with smaller models reaching comparable factuality to larger ones.

URL PDF HTML ☆

赞 0 踩 0

2605.26835 2026-05-27 cs.AI

Helicase: Uncertainty-Guided Supply Chain Knowledge Graph Construction with Autonomous Multi-Agent LLMs

Helicase: 不确定性引导的供应链知识图谱构建与自主多智能体大语言模型

Yunbo Long, Haolang Zhao, Ge Zheng, Alexandra Brintrup

AI总结提出Helicase，一种基于多智能体大语言模型的自主系统，通过不确定性引导的迭代验证和知识图谱构建，解决供应链中需要多跳推理的结构化推断问题，并引入SCQA基准评估。

详情

AI中文摘要

基于大语言模型的多智能体系统已被广泛用于知识检索和报告生成，通过网页搜索和文本推理综合已知信息。然而，供应链中的许多关键信息任务并非简单的一次性查询：它们是结构化推断问题，需要在复杂、碎片化的网络资源中进行多跳推理。诸如“特斯拉哪些组件使用了来自澳大利亚矿山的锂？”之类的问题在任何单一文档中都没有答案；答案必须通过自主构建和分析从碎片化、异构来源中组装起来的动态知识图谱，以计算方式合成。此外，这种发现过程必须具有不确定性意识：决策不仅依赖于答案，还依赖于对其可靠性的校准置信度，该置信度可追溯到来源质量和推理一致性。为了解决这一能力差距，我们提出了Helicase，一种用于不确定性引导的供应链知识图谱构建的自主多智能体大语言模型系统。Helicase将高层供应链查询分解为可执行的调查计划，通过迭代验证循环协调专门的网页搜索、推理和编码智能体，并逐步构建带有每个事实不确定性注释的查询特定供应链知识图谱。其三层不确定性框架在行动、轨迹和记忆层跟踪不确定性，从而实现结构化推断和校准置信度评估。为了评估整个复杂性谱系中的自主推理，我们引入了SCQA（供应链查询评估），这是一个包含80个供应链查询的基准，这些查询组织成四个象限，涵盖单跳到多跳推理，在高低数据可见性下进行。

英文摘要

LLM-based multi-agent systems have been widely adopted for knowledge retrieval and report generation, synthesizing known information through web search and textual reasoning. However, many critical information tasks in supply chains are not simple one-shot queries: they are structural inference problems requiring multi-hop reasoning across complex, fragmented web resources. Questions such as \textit{``Which Tesla components use lithium from Australian mines?''} have no answer in any single document; answers must be computationally synthesized through the autonomous construction and analysis of dynamic knowledge graphs assembled from fragmented, heterogeneous sources. Moreover, such discovery processes must be uncertainty-aware: decisions depend not only on answers but on calibrated confidence in their reliability, traceable to source quality and reasoning consistency. To address this capability gap, we propose \textit{Helicase}, an autonomous multi-agent LLM system for uncertainty-guided supply chain knowledge graph construction. \textit{Helicase} decomposes high-level supply-chain queries into executable investigation plans, coordinates specialized web-search, reasoning, and coding agents through iterative verification loops, and incrementally constructs query-specific supply chain knowledge graphs with per-fact uncertainty annotations. Its three-layer uncertainty framework tracks uncertainty at the action, trajectory, and memory layers, enabling both structural inference and calibrated confidence assessment. To evaluate autonomous reasoning across the full complexity spectrum, we introduce SCQA (Supply Chain Query Assessment), a benchmark of 80 supply chain queries organized into four quadrants spanning single-hop to multi-hop inference under both high and low data visibility.

URL PDF HTML ☆

赞 0 踩 0

2605.26833 2026-05-27 cs.LG cs.AI

Periodic Topological Deep Learning for Polymer Design and Discovery

周期性拓扑深度学习用于聚合物设计与发现

Yasharth Yadav, Tze Kwang Gerald Er, Atsushi Goto, Kelin Xia

AI总结提出基于周期性Vietoris-Rips复形和层次单纯形消息传递的深度学习框架Periodic-TDL，通过捕捉多体相互作用和长程信息，在聚合物性质预测任务上超越现有模型，并验证了酯到酰胺取代和α-甲基化对热稳定性的提升。

详情

Comments: 19 pages, 3 figures, 3 tables

AI中文摘要

聚合物支撑着能源、医疗和材料科学领域的应用，但其广阔的化学空间使得系统性发现充满挑战。大多数机器学习方法将聚合物表示为单个重复单元的分子图，从而忽略了聚合物链的周期性和超越成对键的多体相互作用。我们提出了Periodic-TDL，一个基于周期性Vietoris-Rips复形的深度学习框架，该复形捕捉跨多个空间尺度的多体相互作用，随后通过层次单纯形消息传递（HSMP）编码器将信息从长程相互作用传播到共价键，产生由高阶拓扑特征增强的表征。Periodic-TDL在涵盖电子、光学、物理和热学目标的聚合物性质预测任务中优于所有最先进的模型。此外，我们定量验证了酯到酰胺取代和α-甲基化如何增强热稳定性。使用通过系统取代丙烯酸酯和丙烯酰胺聚合物生成的计算合成数据集（48,208个结构），我们观察到在匹配的聚合物对中，酯到酰胺取代的平均$T_g$增加约$55^\circ$C，主链α-甲基化的平均$T_g$增加约$14^\circ$C。为了验证这些预测趋势，我们使用Periodic-TDL模型分析了来自独立实验测量的六对新型聚合物，包括三篇文献中未报道的新合成聚合物。实验数据成功证实了模型的预测。最终，这些发现表明Periodic-TDL捕捉了特定官能团修饰的潜在物理效应，而不仅仅是优化基准数据集上的预测性能。

英文摘要

Polymers underpin applications across energy, healthcare, and materials science, yet their vast chemical space makes systematic discovery challenging. Most machine learning approaches represent polymers as molecular graphs of a single repeating unit, thereby missing both the periodicity of polymer chains and many-body interactions beyond pairwise bonds. We introduce Periodic-TDL, a deep learning framework built on periodic Vietoris-Rips complexes that capture many-body interactions across multiple spatial scales, followed by a hierarchical simplicial message-passing (HSMP) encoder that propagates information from long-range interactions to covalent bonds, yielding representations enriched by higher-order topological features. Periodic-TDL outperforms all state-of-the-art models across polymer property prediction tasks spanning electronic, optical, physical, and thermal targets. Furthermore, we quantitatively validate how ester-to-amide substitution and $α$-methylation enhance thermal stability. Using a computationally synthesized dataset of 48,208 structures-generated via systematic substitution of acrylate and acrylamide polymers-we observed a mean $T_g$ increase of $\sim 55^\circ$C for ester-to-amide substitutions and $\sim 14^\circ$C for backbone $α$-methylation across matched polymer pairs. To verify these predicted trends, we use our Periodic-TDL model to analyze six novel polymer pairs from independent experimental measurements, including three newly synthesized polymers previously unreported in the literature. The experimental data successfully confirmed the model's predictions. Ultimately, these findings demonstrate that Periodic-TDL captures the underlying physical effects of specific functional group modifications, rather than merely optimizing predictive performance on benchmark datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.26831 2026-05-27 cs.CV cs.RO

OSMa-Bench++: Toward Open-Ended Benchmarking of Semantic Mapping for Manipulation with Prompt-Generated Synthetic Scenes

OSMa-Bench++：面向操作任务的语义映射开放基准测试，使用提示生成的合成场景

Regina Kurkova, Maxim Popov, Sergey Kolyubin

AI总结本文扩展OSMa-Bench，通过提示生成合成室内场景实现可控基准测试，并提出一种基于提示的VQA类别，用于语义映射方法在杂乱、小物体、部分遮挡和光照变化等条件下的压力测试。

详情

Comments: Code: https://github.com/be2rlab/OSMa-Bench-v2

AI中文摘要

语义映射方法越来越多地被用作下游机器人推理和操作的中间场景表示，但它们的评估仍然很大程度上依赖于固定的基准数据集，这些数据集对操作相关边缘情况的覆盖有限。在这项工作中，我们将OSMa-Bench扩展到使用提示生成的合成室内场景进行可控基准测试。我们的流程自动生成场景描述，使用SceneSmith合成相应环境，并将生成的资产适配为OSMa-Bench兼容的仿真格式。这种适配需要一个非平凡的中层，包括语义归一化、材质和纹理修复、着色器回退策略、地面处理、导航设置和受控光照配置。所提出设置的一个关键优势是原始场景生成提示是预先已知的，因此可以作为预期场景的辅助语义规范。我们利用这一特性，将OSMa-Bench的VQA组件扩展了一个基于提示的问题类别。由此产生的框架支持在杂乱、小物体、部分遮挡和光照变化等条件下对语义场景表示进行有针对性的压力测试，并使基准测试更具可扩展性，更好地与下游操作需求对齐。我们的代码可在https://github.com/be2rlab/OSMa-Bench-v2获取。

英文摘要

Semantic mapping methods are increasingly used as intermediate scene representations for downstream robotic reasoning and manipulation, yet their evaluation is still largely tied to fixed benchmark datasets with limited coverage of manipulation-relevant corner cases. In this work, we extend OSMa-Bench toward controllable benchmarking with prompt-generated synthetic indoor scenes. Our pipeline automatically generates scene descriptions, synthesizes corresponding environments with SceneSmith, and adapts the resulting assets into an OSMa-Bench-compatible simulation format. This adaptation requires a nontrivial intermediate layer, including semantic normalization, material and texture repair, shader fallback policies, floor handling, navigation setup, and controlled lighting configuration. A key advantage of the proposed setup is that the original scene-generation prompt is known in advance and can therefore serve as an auxiliary semantic specification of the intended scene. We use this property to extend the VQA component of OSMa-Bench with a prompt-grounded question category. The resulting framework supports targeted stress-testing of semantic scene representations under conditions such as clutter, small objects, partial occlusions, and lighting variation, and makes benchmarking more extensible and better aligned with downstream manipulation requirements. Our code is available at https://github.com/be2rlab/OSMa-Bench-v2.

URL PDF HTML ☆

赞 0 踩 0

2605.26830 2026-05-27 cs.LG cs.AI cs.CV

The Kalman Evolve: Closing the Gap in Kalman Filtering via Interpretable Algorithm Discovery

卡尔曼演化：通过可解释算法发现缩小卡尔曼滤波的差距

Vasileios Saketos, Ming Xiao

AI总结针对非线性传感场景下卡尔曼滤波性能下降的问题，提出Kalman Evolve框架，联合优化噪声参数与更新结构，利用大语言模型生成可解释的非仿射修改，在多个基准上实现高达12%的RMSE降低。

详情

AI中文摘要

状态估计是控制和信号处理中的一个基本问题，卡尔曼滤波器在线性动力学、高斯噪声和已知噪声协方差下提供最优解。然而，这些假设在多普勒雷达和LiDAR等实际传感场景中常常不成立。在这些情况下，最优估计器本质上是非线性的，导致系统性能下降。这产生了一个仅通过调整噪声协方差参数（即卡尔曼滤波器中的过程噪声和测量噪声）无法消除的性能差距。为了解决这一限制，我们提出了Kalman Evolve，一个通过联合优化噪声参数和更新结构来发现改进滤波算法的框架。我们的方法利用大语言模型作为程序空间上的结构化先验，能够生成对经典卡尔曼滤波器的可解释、非仿射修改，同时保留其递归形式。我们提供了分析结果，证明了在常见非线性传感模型下仿射估计器的次优性，从而激发了结构感知更新的必要性。在一系列合成和真实跟踪基准测试中，包括多普勒雷达、基于LiDAR的定位和行人跟踪，所发现的算法始终优于强基线（如优化卡尔曼滤波器），实现了高达12%的RMSE降低。这些结果表明，优化卡尔曼滤波器的结构而不仅仅是其参数，提供了一种实用且可解释的方式来改进状态估计。

英文摘要

State estimation is a fundamental problem in control and signal processing, for which the Kalman Filter provides an optimal solution under linear dynamics, Gaussian noise, and known noise covariances. However, these assumptions often fail in realistic sensing settings such as Doppler radar and LiDAR. In these cases, the optimal estimator is inherently nonlinear, which leads to systematic performance degradation. This creates a performance gap that cannot be eliminated by tuning the noise covariance parameters (i.e., the process and measurement noise in the Kalman Filter) alone. To address this limitation, we propose Kalman Evolve, a framework for discovering improved filtering algorithms by jointly optimizing both noise parameters and the update structure. Our approach leverages large language models (LLMs) as a structured prior over program space, enabling the generation of interpretable, non-affine modifications to the classical Kalman filter while preserving its recursive form. We provide analytical results establishing the suboptimality of affine estimators under common nonlinear sensing models, motivating the need for structure-aware updates. Across a range of synthetic and real-world tracking benchmarks, including Doppler radar, LiDAR-based localization, and pedestrian tracking, the discovered algorithms consistently improve over strong baselines such as the Optimized Kalman Filter, achieving up to 12\% reduction in RMSE. These results suggest that optimizing the structure of the Kalman filter, rather than only its parameters, provides a practical and interpretable way to improve state estimation.

URL PDF HTML ☆

赞 0 踩 0

2605.26828 2026-05-27 cs.RO

Learning Compositional Symbolic Task Rules from Demonstrations with Inductive Logic Programming

通过归纳逻辑编程从演示中学习组合符号任务规则

Oleh Borys, Karla Stepanova

AI总结提出一种基于归纳逻辑编程的分解学习方法，从演示中学习可解释、可重用且支持强泛化的符号任务规则。

详情

Comments: In: ICRA 2026 Workshop on Semantics for Reliable Robot Autonomy: From Environment Understanding and Reasoning to Safe Interaction, Vienna, 2026 In: ICRA 2026, International Joint Workshop on Ontologies, Semantic Maps and Autonomous Robotics Standardization (J-WOSMARS 2026), Vienna, 2026

AI中文摘要

从演示中学习不仅应捕捉任务如何执行，还应解释演示行为的高层任务结构。随着机器人变得更加自主，这种任务表示必须可检查、可重用且人类可解释。为此，我们研究如何通过归纳逻辑编程（ILP）表示和学习机器人任务，将复杂任务分解为不同抽象（本体）层次上的一系列更简单的学习目标。该系统从演示和先验（领域）知识中推断符号规则，并在学习更高层任务结构时重用已学习的规则。我们在一个合成的积木组装场景中评估了该方法，结果表明学习到的抽象是可解释的，并支持对更难的、包含未见物体的保留任务进行强泛化。这些结果初步证明分解的ILP是实现任务级LfD的可行方法。

英文摘要

Learning from Demonstration~(LfD) should capture not only how a task is executed, but also its high-level task structure that explains the demonstrated behavior. As robots become more autonomous, such task representations must be inspectable, reusable, and human-interpretable. To address this, we study how to represent and learn robotic tasks with inductive logic programming~(ILP) by decomposing a complex task into a series of simpler learning objectives at different abstraction (ontological) levels. The system infers symbolic rules from demonstrations and prior (domain) knowledge, and reuses learned rules when learning higher-level task structure. We evaluate the approach in a synthetic block-assembly scenario and show that the learned abstractions are interpretable and support strong generalization to harder, held-out tasks with unseen objects. These results provide preliminary evidence that decomposed ILP is a feasible approach to task-level LfD.

URL PDF HTML ☆

赞 0 踩 0

2605.26827 2026-05-27 cs.CL cs.AI

ContextGuard: Structured Self-Auditing for Context Learning in Language Models

ContextGuard: 语言模型中上下文学习的结构化自我审计

Hongbo Jin, Chi Wang, Haoran Tang, Zhongjing Du, Xu Jiang, Jingqi Tian, Qiaoman Zhang, Jiayu Ding

AI总结提出ContextGuard框架，通过结构化自我审计机制使大语言模型在复杂上下文任务中忠实遵循所有上下文约束，包括外围、持久和格式敏感要求。

2605.26823 2026-05-27 cs.CL

Generating Logically Consistent Synthetic Supply Chain Data with LLM-Driven Knowledge Graph Reasoning

基于LLM驱动知识图谱推理生成逻辑一致的合成供应链数据

Yunbo Long, Ge Zheng, Liming Xu, Alexandra Brintrup

AI总结针对合成供应链数据需保持操作逻辑一致性的问题，提出TabKG框架，通过构建列关系知识图谱并利用多LLM集成验证关系，结合潜在扩散模型生成逻辑一致的表格数据。

详情

AI中文摘要

合成数据为供应链分析中两个长期存在的障碍（数据稀缺和数据隐私）提供了一种有前景的解决方案。然而，要使合成数据支持运营模拟和决策，它必须不仅再现真实记录的统计分布，还要保留支配供应链流程的\emph{操作逻辑}，包括时间顺序、数学依赖、层次分类和条件规则，这些使记录在操作上合理。我们将这种逻辑视为供应链数据的“物理”。现有的表格生成模型主要针对分布保真度和下游预测效用进行优化，因此通常生成统计上看似真实但违反基本操作约束的记录。本文介绍了 extbf{ extit{TabKG}}，一个知识图谱引导的框架，用于生成逻辑一致的合成供应链表格数据。TabKG构建了一个 extbf{ extit{列关系知识图谱（CR-KG）}}来表示数据操作依赖。它使用多LLM集成和多数投票从列元数据中提出候选关系，通过真实数据验证这些关系以去除幻觉或未支持的边，然后使用验证后的CR-KG指导生成。具体而言，TabKG将原始表压缩为独立列，使用潜在扩散模型生成这些列，并根据验证后的关系确定性地重建依赖列，从而通过构造强制与发现的操作规则保持逻辑一致性。

英文摘要

Synthetic data offers a promising solution to two persistent barriers in supply chain analytics: data scarcity and data privacy. However, for synthetic data to support operational simulation and decision-making, it must do more than reproduce the statistical distributions of real records, and also preserve the \emph{operational logic} that governs supply chain processes, including the temporal orderings, mathematical dependencies, hierarchical taxonomies, and conditional rules that make a record operationally plausible. We consider this logic as the ``physics'' of supply chain data. Existing tabular generative models are primarily optimized for distributional fidelity and downstream predictive utility, and therefore often generate records that appear statistically realistic but violate fundamental operational constraints. This paper introduces \textbf{\textit{TabKG}}, a knowledge-graph-guided framework for logically consistent synthetic supply chain tabular data generation. TabKG constructs a \textbf{\textit{Column Relationship Knowledge Graph (CR-KG)}} to represent data operational dependencies. It uses a multi-LLM ensemble with majority voting to propose candidate relationships from column metadata, validates these relationships against real data to remove hallucinated or unsupported edges, and then uses the validated CR-KG to guide generation. Specifically, TabKG compresses the original table into independent columns, generates these columns using a latent diffusion model, and deterministically reconstructs dependent columns according to the validated relationships, enforcing logical consistency by construction with respect to the discovered operational rules.

URL PDF HTML ☆

赞 0 踩 0

2605.26821 2026-05-27 hep-ph cs.LG hep-ex

Particle-Lund Multimodality in Jet Taggers

喷注标记器中的粒子-拉普兰多模态

Loukas Gouskos, Benedikt Maier

AI总结提出PLuM多模态架构，联合处理粒子成分与拉普兰平面分裂，通过交叉注意力机制研究显式QCD层次结构是否补充原始粒子表示，发现对顶夸克和H→bb标记有系统性提升，在HH(4b)分析中背景抑制提高25%。

详情

AI中文摘要

拉普兰平面提供了喷注内QCD辐射的物理动机层次表示，而基于变换器的标记器通过直接从原始粒子成分及其成对关系中学习达到了最先进的性能。我们研究变换器是否从成分级输入隐式捕获层次QCD结构，或者显式物理表示是否仍然具有互补性。为了测试这一点，我们引入了PLuM，一种多模态架构，将粒子成分和拉普兰平面分裂投影到共享潜在空间，并用统一变换器联合处理两者。交叉注意力允许模型探测结构化QCD信息是否提供了超出粒子单独编码的区分能力。我们观察到顶夸克和H→bb标记的系统性增益，而在H→cc或H→4q拓扑中没有发现可比改进。这种选择性增强表明，即使在高度表达性的架构中，关于b喷注形成的显式层次信息仍然与原始粒子表示互补，而其他拓扑已经在成分级被很好地捕获。对于高影响LHC分析，如洛伦兹增强的双希格斯玻色子搜索中的四b夸克末态（HH(4b)），增益显著：在25%的双希格斯效率工作点，PLuM的背景抑制比基线高25%。我们的结果表明，在变换器时代，QCD辐射的物理结构化表示仍然保留区分价值，激励进一步研究深度学习算法如何编码喷注动力学的不同方面。

英文摘要

The Lund plane offers a physics-motivated, hierarchical representation of QCD radiation within jets, while transformer-based taggers have reached state-of-the-art performance by learning directly from raw particle constituents and their pairwise relations. We investigate whether transformers implicitly capture hierarchical QCD structure from constituent-level inputs, or whether explicit physics representations remain complementary. To test this, we introduce PLuM, a multimodal architecture that projects particle constituents and Lund plane splittings into a shared latent space, processing both jointly with a unified transformer. Cross-attention allows the model to probe whether structured QCD information provides discriminating power beyond what particles alone encode. We observe systematic gains for top-quark and $\mathrm{H}\to\mathrm{b}\bar{\mathrm{b}}$ tagging, while finding no comparable improvement for $\mathrm{H}\to\mathrm{c}\bar{\mathrm{c}}$ or $\mathrm{H}\to 4\mathrm{q}$ topologies. This selective enhancement suggests that explicit hierarchical information about b-jet formation remains complementary to raw particle representations even in highly expressive architectures, while other topologies are already well-captured at constituent level. For high-impact LHC analyses such as Lorentz-boosted di-Higgs searches in the four $\mathrm{b}$ quark final state ($\mathrm{H}\mathrm{H}(4\mathrm{b})$), the gains are substantial: at a $25\%$ di-Higgs efficiency working point, PLuM achieves $25\%$ higher background rejection than the baseline. Our results indicate that physically structured representations of QCD radiation retain discriminating value in the transformer era, motivating further study into how different aspects of jet dynamics are encoded by deep learning algorithms.

URL PDF HTML ☆

赞 0 踩 0

2605.26820 2026-05-27 cs.RO

Can VLA Models Learn from Real-World Data Continually without Forgetting?

VLA 模型能否从现实世界数据中持续学习而不遗忘？

Jiarun Zhu, Yijun Hong, Xiaoquan Sun, Zetian Xu, Mingqi Yuan, Zhiyong Wang, Wenjun Zeng, Jiayu Chen

AI总结本研究通过构建包含四个顺序操作任务的真实世界持续学习数据集，实证发现视觉-语言-动作（VLA）模型在持续学习异构真实世界演示时存在严重灾难性遗忘，并系统评估了经验回放方法的关键实施因素。

详情

AI中文摘要

视觉-语言-动作（VLA）模型为通用机器人提供了有前景的基础。然而，它们在现实场景中的成功部署需要能够持续获取新技能，同时保留先前学习的行为。虽然开创性研究在狭窄的模拟环境中研究了VLA模型的持续学习，但在现实条件下这一挑战仍未得到充分探索。为解决这一局限，我们构建了一个真实世界的持续学习数据集，包含四个顺序操作任务，涵盖刚体抓取放置、接触式按压和可变形物体折叠。利用该数据集，我们进行了全面实验，发现VLA模型在持续学习异构真实世界演示时遭受显著的灾难性遗忘。然后，我们系统评估了经验回放，并揭示了决定其成功的关键实施因素。总之，这项工作提供了真实世界持续VLA学习的首次实证研究，并为部署长期运行的机器人策略提供了实用指导。

英文摘要

Vision-language-action (VLA) models provide a promising foundation for general-purpose robotics. However, their successful deployment in real-world scenarios requires the ability to continually acquire new skills while retaining previously learned behaviors. While pioneering research has studied the continual learning of VLA models in narrowly simulated environments, this challenge remains largely unexplored under realistic conditions. To address this limitation, we construct a real-world continual learning dataset comprising four sequential manipulation tasks, spanning rigid-object pick-and-place, contact-rich pressing, and deformable-object folding. Using this dataset, we conduct comprehensive experiments and find that VLA models suffer significant catastrophic forgetting when continually learning from heterogeneous real-world demonstrations. We then systematically evaluate experience replay and uncover key implementation factors that govern its success. In summary, this work provides the first empirical study of real-world continual VLA learning and offers practical guidance for deploying long-lived robot policies.

URL PDF HTML ☆

赞 0 踩 0

2605.26819 2026-05-27 cs.IR cs.AI

RAGEAR: Retrieval-Augmented Graph-Enhanced Academic Recommender

RAGEAR: 检索增强的图增强学术推荐器

Francesco Granata, Lorenzo Lamazzi, Misael Mongiovì, Francesco Poggi, Valeria Secchini

AI总结提出RAGEAR，一种神经符号推荐系统，结合密集检索和知识图谱，通过图感知聚合函数将片段级证据传播到课程级推荐，在学术课程推荐中优于元数据基线。

详情

AI中文摘要

我们提出了RAGEAR（检索增强的图增强学术推荐器），一种用于学术课程推荐的神经符号推荐系统。RAGEAR将完整讲座转录本的密集检索与符号知识图谱相结合，该图谱建模课程、课程、转录本片段、学分、学习计划和课程信息。知识图谱支持基于结构化约束（如学分、学科、学习计划和先修课程）的符号过滤和情境化。与基于元数据的方法不同，它通过检索与学生查询语义对齐的转录本片段来利用细粒度的教学内容。主要贡献是一种图感知聚合函数，它将片段级证据传播到课程级推荐。得分结合了三个因素：与课程相关的检索相似性份额、其相关片段的基于排名的强度以及证据在课程间的分布。我们通过人工评估样本和大规模基于LLM的相关性评估，在152个学生类查询上评估了RAGEAR。结果表明，讲座转录本优于仅元数据检索，并且RAGEAR进一步提高了基于转录本的归一化SumP基线的排名质量，尤其是在排名靠前的推荐中。

英文摘要

We present RAGEAR (Retrieval-Augmented Graph-Enhanced Academic Recommender), a neurosymbolic recommender system for academic course recommendation. RAGEAR combines dense retrieval over full lecture transcripts with a symbolic Knowledge Graph modelling courses, lessons, transcript chunks, credits, study plans, and curricular information. The Knowledge Graph supports symbolic filtering and contextualisation based on structured constraints, such as credits, academic disciplines, study plans, and prerequisites. Unlike metadata-based approaches, it exploits fine-grained instructional content by retrieving transcript chunks semantically aligned with a student's query. The main contribution is a graph-aware aggregation function that propagates chunk-level evidence to course-level recommendations. The score combines three factors: the share of retrieved similarity associated with a course, the rank-based strength of its relevant chunks, and the distribution of evidence across lessons. We evaluate RAGEAR on 152 student-like queries through a human evaluation sample and a large-scale LLM-based relevance assessment. Results show that lecture transcripts improve over metadata-only retrieval, and that RAGEAR further improves ranking quality over a transcript-based normalized SumP baseline, especially for top-ranked recommendations.

URL PDF HTML ☆

赞 0 踩 0

2605.26808 2026-05-27 cs.LG cs.AI cs.IT math.IT

Innovation: An Almost Characterization of Hallucination

创新：幻觉的几乎刻画

Nishant P. Das, Piyush Srivastava

AI总结本文引入“创新”属性来刻画大语言模型幻觉的必然性，证明创新与幻觉几乎等价，并基于创新率给出新的幻觉率下界。

详情

AI中文摘要

幻觉是大语言模型（LLMs）的一个核心局限，大量工作致力于理解和缓解它。为此，Kalai 和 Vempala（STOC 2024）引入了一个概率框架来形式化校准和幻觉，并证明高概率下，校准的 LLM 大致以“缺失质量”（衡量训练数据相对于其来源的不完整程度）的速率产生幻觉。这引出了两个基本问题：(i) 校准的 LLM 的什么属性使得幻觉不可避免？(ii) 能否通过放弃校准来避免幻觉？我们通过引入一个更简单的属性——我们称之为“创新”——来回答这些问题，该属性衡量模型产生训练数据之外输出的倾向。我们证明，创新由 Kalai 和 Vempala 识别的幻觉条件蕴含，并且进一步，它是幻觉的几乎刻画：幻觉蕴含创新，反之，创新高概率地蕴含幻觉。我们还基于“创新率”给出了幻觉率的下界，并通过将创新率与缺失质量联系起来，获得了基于缺失质量的新的幻觉率下界，扩展了 Kalai 和 Vempala 的结果。

英文摘要

Hallucination is a central limitation of large language models (LLMs), and substantial effort has been devoted to understanding and mitigating it. Towards this, Kalai and Vempala (STOC 2024) introduced a probabilistic framework formalizing calibration and hallucination, and showed that, with high probability, calibrated LLMs hallucinate roughly at the rate of the "missing mass", a measure of how incomplete the training data is relative to its source. This raises two fundamental questions: (i) what property of a calibrated LLM makes hallucinations unavoidable? and (ii) can hallucinations be avoided by giving up calibration? We answer these questions by introducing a simpler property we call innovation that measures the tendency of a model to produce outputs outside the training data. We show that innovation is implied by the condition for hallucination identified by Kalai and Vempala, and, further, that it is an almost characterization of hallucination: hallucination implies innovation, and conversely, innovation implies hallucination with high probability. We also provide lower bounds on the hallucination rate based on the "innovation rate", and by relating innovation rate back to missing mass, we obtain new hallucination rate lower bounds based on missing mass that extend the results of Kalai and Vempala.

URL PDF HTML ☆

赞 0 踩 0

2605.26807 2026-05-27 cs.SE cs.AI

HTMLCure: Turning Browser Experience into State Guided Repair for Interactive HTML

HTMLCure：将浏览器体验转化为面向交互式HTML的状态引导修复

Jiajun Wu, Jian Yang, Tuney Zheng, Wei Zhang, Haowen Wang, Yihang Lou, Xianglong Liu

AI总结提出HTMLCure框架，通过浏览器交互执行、状态感知诊断和闭环修复引擎，从大规模HTML页面中筛选并修复可修复页面，显著提升SFT数据质量和模型性能。

详情

Comments: 27 pages, 11 figures. Code: https://github.com/wuyuVerse/HTMLCure

AI中文摘要

LLM现在可以生成完整的HTML页面，但其中许多页面仅在表面上正确：它们渲染一次，然后在滚动、悬停、点击、调整大小或游戏过程中失败。基于截图的评估可能遗漏这些失败，而过滤会丢弃许多仍然可修复的页面。我们引入了HTMLCure，一个浏览器体验框架，在系统与页面交互后评估HTML。评估器跨视口和交互状态执行页面，记录确定性的浏览器证据，并向VLM提供来自执行轨迹的精选关键帧，而非孤立截图。相同的状态信号驱动闭环修复引擎：HTMLCure诊断当前页面，选择特定状态的修复家族，再次运行每个候选页面，并导出质量清理后的页面用于SFT。在97K提示语料库上，这将直接可用的种子扩展为63703个质量清理页面的候选池，从中我们构建了最终的40K页面精炼SFT集。在相同骨干和训练方案下，HTMLCure-27B-Refined在HTMLBench-400上达到50.6分，确定性测试用例通过率为45.2%，与Kimi-K2.6和GPT-5.4等强参考行处于相同性能区间。在发布的MiniAppBench验证集上，它达到81.2的平均分，比原始27B SFT提高15.3分，接近强参考系统的水平。

英文摘要

LLMs can now produce full HTML pages, but many of those pages are only superficially correct: they render once, then fail under scroll, hover, click, resize, or gameplay. Evaluation from screenshots can miss these failures, and filtering discards many pages that are still repairable. We introduce HTMLCure, a browser experience framework that evaluates HTML after the system has interacted with it. The evaluator executes the page across viewports and interaction states, records deterministic browser evidence, and gives the VLM curated keyframes from the executed trajectory rather than isolated screenshots. The same state signal drives a closed loop repair engine: HTMLCure diagnoses the current page, chooses a state specific repair family, runs each candidate again, and exports quality cleared pages for SFT. On a 97K prompt corpus, this expands the directly usable seed into a candidate pool of 63703 quality cleared pages, from which we construct the final refined SFT set of 40K pages. Under the same backbone and training recipe, HTMLCure-27B-Refined reaches 50.6 on HTMLBench-400 with 45.2% deterministic test case pass, placing it in the same performance band as strong reference rows such as Kimi-K2.6 and GPT-5.4. On the released MiniAppBench validation split, it reaches 81.2 average, improving raw 27B SFT by 15.3 points and approaching the level of strong reference systems.

URL PDF HTML ☆

赞 0 踩 0

2605.26802 2026-05-27 cs.LG

PATE-TabTransGAN: Differentially Private Synthetic Tabular Data Generation via Transformer-Based Student Discrimination

PATE-TabTransGAN：基于Transformer学生鉴别的差分隐私合成表格数据生成

M. Youssef, M. Woźniak

AI总结提出PATE-TabTransGAN框架，结合教师集成私有聚合（PATE）机制与基于Transformer的学生鉴别器，在正式差分隐私保证下生成高质量合成表格数据，并在四个基准数据集上取得最优或并列最优的AUROC。

详情

Comments: 16 pages, 3 figures, 4 tables. Submitted for publication

AI中文摘要

在正式差分隐私保证下生成高保真合成表格数据仍然是一个开放挑战。提供强理论保护的方法通常牺牲了真实合成所需的特征间依赖建模，而擅长捕获复杂列关系的架构仅提供经验隐私保证。我们提出PATE-TabTransGAN，一个生成框架，将教师集成私有聚合（PATE）机制与基于Transformer的学生鉴别器相结合，以共同满足这两个要求，并采用GNMax RDP会计进行数值稳定的隐私核算。在不相交分区上训练的Logistic回归教师集成通过噪声聚合标签监督学生，残差生成器针对这个差分隐私学生进行优化，通过后处理继承正式的(ε, δ)-DP保证。将PATE-TabTransGAN与PATE-GAN、DP-GAN和DP-CTGAN（被认为是差分隐私表格合成的最先进方法）进行比较。在四个表格基准（Adult、Breast、Cardio、Cervical）上进行的实验证实了所提方法的高质量：PATE-TabTransGAN在所有四个数据集上达到最佳或并列最佳的AUROC。在AUCPR上，它在Cardio上与最强基线持平，在Cervical上领先，在Breast上落后；在Adult上，我们证明AUCPR对正类惯例高度敏感，观察到的差距与评估流程之间的惯例差异一致，而非合成缺陷。

英文摘要

Generating high-fidelity synthetic tabular data under formal differential privacy guarantees remains an open challenge. Methods that provide strong theoretical protection typically sacrifice the modeling of inter-feature dependencies required for realistic synthesis, while architectures that excel at capturing complex column relationships offer only empirical privacy guarantees. We present PATE-TabTransGAN, a generative framework that integrates the Private Aggregation of Teacher Ensembles (PATE) mechanism with a Transformer-based student discriminator to jointly address both requirements, and employs a GNMax RDP accountant for numerically stable privacy accounting. An ensemble of Logistic Regression teachers trained on disjoint partitions supervise the student via noisy-aggregated labels, and a residual generator is optimized against this differentially private student, inheriting formal (ε, δ)-DP guarantees by post-processing. PATE-TabTransGAN was compared with PATE-GAN, DP-GAN, and DP-CTGAN, considered state-of-the-art in differentially private tabular synthesis. Experiments conducted on four tabular benchmarks (Adult, Breast, Cardio, Cervical) confirmed the high quality of the proposed method: PATE-TabTransGAN attains the best or tied-best AUROC on all four datasets. On AUCPR it matches the strongest baseline on Cardio, leads on Cervical, and trails on Breast; on Adult, we demonstrate that AUCPR is highly sensitive to positive-class convention, and that the observed gap is consistent with a convention difference between evaluation pipelines rather than a synthesis deficit.

URL PDF HTML ☆

赞 0 踩 0

2605.26801 2026-05-27 cs.CL

Psychological Constructs in Shared Semantic Space

共享语义空间中的心理构念

Hubert Plisiecki

AI总结本文提出一个框架，通过将心理构念表示为共享词嵌入空间中的方向，并使用监督语义微分从文本-结果关联中估计构念特定的语义梯度，从而实现跨不同测量工具和研究传统的心理构念的语义可比性。

详情

AI中文摘要

心理构念通常在不同的测量工具、数据集和研究传统中进行测量，这使得直接比较变得困难。本文提出了一个框架，通过将心理构念表示并比较为共享词嵌入空间中的方向，使这些构念在语义上具有可比性。使用监督语义微分，我们从文本-结果关联中估计构念特定的语义梯度，并将其投影到理论驱动的参考轴上。作为初始测试案例，我们使用效价、唤醒度和支配度（VAD）作为情感坐标系。首先，我们从英语词汇级情感规范中恢复可解释的VAD方向。其次，我们将27个GoEmotions类别的语义梯度投影到该空间中，并恢复预期的情感组织，特别是在效价和唤醒度维度上。第三，我们将相同程序应用于源自IPIP-NEO-300项目-因子关联的大五人格领域和子域。领域层面的定位大体一致，而子域层面的结果更具探索性，因为它们依赖于稀疏的问卷文本。结果表明，只要语义定位的稳定性和可解释性得到评估，嵌入空间可以支持在其他不可比较的心理测量之间进行构念层面的比较。

英文摘要

Psychological constructs are often measured in separate instruments, datasets, and research traditions, which makes direct comparison difficult. This paper proposes a framework for making such constructs semantically commensurate by representing and comparing them as directions in a shared word-embedding space. Using Supervised Semantic Differential, we estimate construct-specific semantic gradients from text-outcome associations and project them onto theoretically motivated reference axes. As an initial test case, we use Valence, Arousal, and Dominance (VAD) as an affective coordinate system. First, we recover interpretable VAD directions from English word-level affective norms. Second, we project semantic gradients for 27 GoEmotions categories into this space and recover the expected organization of emotions, especially along valence and arousal. Third, we apply the same procedure to Big Five personality domains and facets derived from IPIP-NEO-300 item-factor associations. Domain-level placements are broadly coherent, while facet-level results are more exploratory because they rely on sparse questionnaire text. The results suggest that embedding spaces can support construct-level comparison across otherwise incommensurable psychological measurements, provided that semantic placements are assessed for stability and interpretability.

URL PDF HTML ☆

赞 0 踩 0

2605.26797 2026-05-27 cs.LG cs.CL

Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior

潜在循环Transformer：架构探索、训练策略与扩展行为

Zeyi Huang, Xuehai He, LiLiang Ren, Yiping Wang, Baolin Peng, Hao Cheng, Shuohang Wang, Pengcheng He, Jianfeng Gao, Yong Jae Lee, Yelong Shen

AI总结提出潜在循环Transformer（LRT），通过跨层循环潜在路径重用前一token的高层隐藏状态作为记忆，在不增加暂停token或额外深度循环的情况下，以约2倍基线计算实现并行训练，在匹配有效计算下提升语言建模损失和上下文学习能力，仅增加0.3%参数。

详情

AI中文摘要

我们研究潜在循环Transformer（LRT），一种自回归Transformer的轻量级增强，它重用来自前一个token的高层源层隐藏状态作为下一个token的循环记忆。由于该源状态在普通解码过程中已经计算，LRT跨位置添加跨层循环潜在路径，无需插入暂停token或额外深度循环，并且保留了标准注意力机制和KV-cache接口。为了在不顺序展开Transformer的情况下大规模预训练这种循环，我们引入了交错并行训练：一次完整的全序列初始化前向传播构建共享缓冲区；然后不相交的位置子集并行细化并写回，使得所有token在约2倍基线计算下获得循环记忆感知的监督。在nanochat风格的主干网络和广泛的每参数token预算范围内，LRT在匹配有效计算下改进了语言建模损失和上下文学习，同时仅增加0.3%的参数。

英文摘要

We study Latent Recurrent Transformer (LRT), a lightweight augmentation of autoregressive transformers that reuses a high-level source-layer hidden state from the previous token as recurrent memory for the next token. Because this source state is already computed during ordinary decoding, LRT adds a cross-layer recurrent latent pathway across positions without inserting pause tokens or extra depth loops, and the standard attention mechanism and KV-cache interface are preserved. To pretrain this recurrence at scale without sequentially unrolling the transformer, we introduce interleaved parallel training: a single full-sequence initialization forward pass builds a shared buffer; then disjoint position subsets are refined in parallel and written back, so that all tokens receive recurrent-memory-aware supervision at roughly 2 times baseline compute. Across nanochat style backbones and a wide range of tokens-per-parameter budgets, LRT improves both language-modeling loss and in-context learning under matched effective compute while adding as little as 0.3% parameters.

URL PDF HTML ☆

赞 0 踩 0