arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.24776 2026-05-26 cs.CV

How Noisy Poses Break Inverse Dynamics: Analysis and Mitigation for Video-Based Joint Torque Estimation

噪声姿态如何破坏逆动力学：基于视频的关节力矩估计的分析与缓解

Donghyun Kim, Chanyoung Kim, Eunseo Jeong, Youngjoong Kwon, Seong Jae Hwang

发表机构 * Emory University（埃默里大学）； Yonsei University（延世大学）

AI总结本文系统分析了3D人体姿态估计噪声通过逆动力学放大关节力矩误差的问题，提出SMPL-Dynamics模块并通过可微姿态优化将力矩误差降低93%。

详情

AI中文摘要

单目3D人体姿态估计的最新进展使得从视频中实现精确的身体跟踪成为可能。然而，由于逆动力学中的噪声放大，将这些运动学估计转化为物理量（如关节力矩）仍然具有挑战性。在这项工作中，我们系统分析了姿态估计噪声如何通过逆动力学管道传播。我们提出了三个关键发现：（1）通过数值微分计算关节力矩时，姿态噪声被放大约1000倍；（2）近端关节（脊柱、髋部）对噪声的敏感度比远端关节（手腕、手）高10倍；（3）在微分之前进行低通滤波可显著减少这种放大。为了支持这一分析，我们开发了SMPL-Dynamics，这是一个用于SMPL人体模型的完全可微逆动力学模块，无需外部物理模拟器。我们的模块支持端到端梯度计算，并通过可微姿态优化证明了这一点，该优化将力矩误差降低了93%，而姿态变化可忽略不计。

英文摘要

Recent advances in monocular 3D human pose estimation enable accurate body tracking from video. However, translating these kinematic estimates into physical quantities, such as joint torques, remains challenging due to noise amplification through inverse dynamics. In this work, we provide a systematic analysis of how pose estimation noise propagates through the inverse dynamics pipeline. We present three key findings: (1) pose noise is amplified by approximately 1,000x when computing joint torques via numerical differentiation, (2) proximal joints (spine, hips) are up to 10x more sensitive to noise than distal joints (wrists, hands), and (3) low-pass filtering before differentiation substantially reduces this amplification. To enable this analysis, we develop SMPL-Dynamics, a fully differentiable inverse dynamics module for the SMPL body model that requires no external physics simulators. Our module supports end-to-end gradient computation, and we demonstrate this through differentiable pose refinement, which reduces torque error by 93% with negligible change in pose.

URL PDF HTML ☆

赞 0 踩 0

2605.24775 2026-05-26 cs.AI cs.MA

PRIMA: Operational Patterns for Resilient Multi-Agent Research with Verifiable Identity and Convergent Feedback

PRIMA: 具有可验证身份和收敛反馈的弹性多智能体研究的操作模式

Sasank Annapureddy

发表机构 * GitHub

AI总结针对长时间运行的多智能体LLM系统面临的故障模式，提出PRIMA框架，包含弹性恢复、子智能体操作规范和结构化工程交付的多阶段应用模式，并通过图同构案例验证其有效性。

Comments 11 pages. Single-author preprint. Supplementary case-study report (Graph Isomorphism algorithm proposal with three theorems, five conjectures, complete complexity analysis, and hard-instance evaluation) available at https://spockstein.github.io/prima/case-study-graph-isomorphism.html

详情

AI中文摘要

将LLM作为协调的多智能体研究系统运行数小时，会暴露出单次评估无法发现的故障模式：上游提供商无预警地限制服务，子智能体使任务偏离以适应可用工具，叙述机制而非使用它，以自我道歉开始修订迭代，或将上游上下文视为可执行指令。我们提出PRIMA，其主要贡献是三种应对这些故障模式的操作模式：(1) 弹性与恢复层，检测上游速率限制信号，将类型化的暂停记录持久化到磁盘，并在进程重启后恢复长时间运行的任务而不重新执行已收敛的工作；(2) 子智能体操作规范，将任务保真度、工具使用、修订和步骤间上下文边界规范编码为结构化的提示层；(3) 用于结构化工程交付的多阶段应用模式，将正交的草稿步骤与最终综合前的显式跨文档协调过程配对。这些模式基于一个基础协议：具有显式收敛标准的研究程序规范语言、双指标评分引擎（LLM评判的评分标准加沙盒代码）、外部元优化循环、事件驱动持久化、基于钩子的中间件、上下文压缩和多提供商LLM抽象。智能体身份来源于素数幂，提供无冲突标识符和无需中央注册表的可轻松验证的集群成员资格。理论保证包括$O(k)$验证、$O(V+E)$ DAG验证以及由算术基本定理保证的身份无冲突。一个图同构案例研究将架构主张落实到生成的产物中：一个六步协议，产生了一篇研究论文，提出了一种新的规范形式算法，包含三个定理和五个猜想。

英文摘要

Operating LLMs as coordinated multi-agent research systems over multi-hour runs surfaces failure modes that single-shot evaluation cannot: upstream providers throttle without warning, sub-agents drift the task to fit accessible tools, narrate machinery instead of using it, open revision iterations with self-apology, or treat upstream context as executable directives. We present PRIMA, whose primary contributions are three operational patterns for surviving these failure modes: (1) a resilience-and-recovery layer that detects upstream rate-limit signals, persists a typed pause record to disk, and resumes long-running runs without re-executing converged work even across process restarts; (2) a sub-agent operating discipline encoding task-fidelity, tool-use, revision, and inter-step context-boundary norms as a structural prompt layer; (3) a multi-phase application pattern for structured engineering deliverables pairing orthogonal draft steps with an explicit cross-document harmonization pass before final synthesis. These sit atop a foundational protocol: a research-program specification language with explicit convergence criteria, a dual-metric scoring engine (LLM-judged rubric plus sandboxed code), an outer meta-optimization loop, event-driven persistence, hook-based middleware, context compaction, and a multi-provider LLM abstraction. Agent identities derive from prime powers, giving collision-free identifiers and trivially-verifiable cluster membership without a central registry. Theoretical guarantees include $O(k)$ verification, $O(V+E)$ DAG validation, and identity collision freedom by the Fundamental Theorem of Arithmetic. A Graph Isomorphism case study grounds the architectural claims in a generated artifact: a six-step protocol that produced a research paper proposing a new canonical-form algorithm with three theorems and five conjectures.

URL PDF HTML ☆

赞 0 踩 0

2605.24774 2026-05-26 cs.LG physics.comp-ph

Hermite-NGP: Gradient-Augmented Hash Encoding for Learning PDEs

Hermite-NGP：用于学习PDE的梯度增强哈希编码

Jinjin He, Zhiqi Li, Sinan Wang, Bo Zhu

发表机构 * Georgia Institute of Technology, Atlanta, GA, USA（佐治亚理工学院，亚特兰大，GA，美国）

AI总结提出Hermite-NGP，一种梯度增强的多分辨率哈希编码，通过显式存储哈希网格顶点处的函数值和混合偏导数并利用Hermite插值实现解析梯度计算，从而快速准确地计算神经PDE求解器的空间导数，并引入多分辨率课程训练策略，在2D和3D PDE基准上实现高达约20倍误差降低和2-10倍收敛时间减少。

Comments Accepted by ICML 2026.Project page: https://jinjinhe2001.github.io/hermite-ngp/

详情

AI中文摘要

我们提出Hermite-NGP，一种梯度增强的多分辨率哈希编码，旨在实现神经PDE求解器空间导数的快速准确计算。与现有依赖自动微分或有限差分且存在不稳定或高成本的NGP方法不同，Hermite-NGP在哈希网格顶点处显式存储函数值和混合偏导数，从而通过Hermite插值实现梯度、雅可比矩阵和海森矩阵的完全解析计算。该设计在保持NGP的效率和空间自适应性的同时，支持高达二阶的解析微分算子。我们进一步引入一种类似于多重网格V-cycle的多分辨率课程训练策略，以实现从粗到细的优化。在一系列2D和3D PDE基准测试中，Hermite-NGP相比先前的神经PDE方法实现了高达约20倍的误差降低，并将与其他求解器相比的收敛时间缩短了2到10倍，对于多达1700万参数的模型，每个epoch的训练时间低至3.5毫秒。

英文摘要

We propose Hermite-NGP, a gradient-augmented multi-resolution hash encoding designed to enable fast and accurate computation of spatial derivatives for neural PDE solvers. Unlike existing NGP-based approaches that rely on automatic differentiation or finite differences and suffer from instability or high cost, Hermite-NGP explicitly stores function values and mixed partial derivatives at hash grid vertices, allowing fully analytic evaluation of gradients, Jacobians, and Hessians via Hermite interpolation. This design preserves the efficiency and spatial adaptivity of NGP while supporting analytic differential operators up to second order. We further introduce a multi-resolution curriculum training strategy analogous to multigrid V-cycles to enable coarse-to-fine optimization. Across a range of 2D and 3D PDE benchmarks, Hermite-NGP achieves up to approximately 20 times lower error than prior neural PDE methods, and reduces wall-clock convergence time by 2 to 10 times compared to other solvers, with per-epoch training times as low as 3.5 ms for models with up to 17M parameters.

URL PDF HTML ☆

赞 0 踩 0

2605.24773 2026-05-26 cs.AI

Uncertainty Decomposition via Cyclical SG-MCMC and Soft-label Learning for Subjective NLP

通过循环SG-MCMC和软标签学习进行主观NLP中的不确定性分解

Keito Inoshita, Takato Ueno

发表机构 * Faculty of Business and Commerce（商科学部）； Data Science and AI Innovation Research Promotion Center（数据科学与人工智能创新研究促进中心）； Graduate School of Data Science（数据科学研究生院）

AI总结提出结合循环随机梯度马尔可夫链蒙特卡洛（cSG-MCMC）与软标签学习的方法，在情感分类中沿多个轴评估不确定性，并在GoEmotions基准上优于现有方法。

详情

AI中文摘要

情感分类中标注者的分歧反映了情感概念固有的模糊性，对于主观NLP中的预测质量评估至关重要。然而，先前没有工作将软标签学习与贝叶斯深度学习相结合，以评估包括标注者分布保真度在内的多个轴上的不确定性。我们在冻结的RoBERTa上通过循环随机梯度马尔可夫链蒙特卡洛（cSG-MCMC）训练一个线性头，在五轴评估下以软标签目标针对经验标注者分布。在28情感的GoEmotions基准上，所提出的方法在三个轴上同时优于蒙特卡洛Dropout和深度集成——标注者分布的Jensen-Shannon散度（JSD）、每个情感偶然不确定性与分歧之间的Spearman相关性，以及选择性预测的风险-覆盖率曲线下面积（AURC）和ROC曲线下面积（AUROC）——表明独立的轴可以从一个后验中联合获得。事后温度缩放表现出双向效应，建立了硬标签校准和标注者JSD作为独立维度，并激励联合报告作为诚实协议。

英文摘要

Annotator disagreement in emotion classification reflects ambiguity intrinsic to emotion concepts and is essential for predictor-quality assessment in subjective NLP. Yet no prior work integrates soft-label learning with Bayesian deep learning to evaluate uncertainty along axes including annotator-distribution fidelity. We train a linear head on a frozen RoBERTa via cyclical stochastic gradient Markov chain Monte Carlo (cSG-MCMC), targeting the empirical annotator distribution with a soft-label objective under a five-axis evaluation. On the 28-emotion GoEmotions benchmark, the proposed method outperforms Monte Carlo Dropout and Deep Ensemble simultaneously on three axes -- Jensen-Shannon divergence (JSD) to the annotator distribution, Spearman correlation between per-emotion aleatoric uncertainty and disagreement, and selective-prediction Area Under the Risk-Coverage Curve (AURC) and Area Under the ROC Curve (AUROC) -- showing independent axes are jointly attainable from one posterior. Post-hoc temperature scaling exhibits a bidirectional effect, establishing hard-label calibration and annotator-JSD as independent dimensions and motivating joint reporting as an honest protocol.

URL PDF HTML ☆

赞 0 踩 0

2605.24771 2026-05-26 cs.CV cs.AI cs.LG

From Theory to Decision Rule: Calibrating the Noisy-Label Crossover for Vision-Language Model Weak Supervision Across Three Medical-Imaging Benchmarks

从理论到决策规则：校准视觉-语言模型弱监督的噪声标签交叉点——基于三个医学影像基准

Bruce Changlong Xu, Jose James, Alexander Ryu

发表机构 * Department of Computer Science, Stanford University（计算机科学系，斯坦福大学）

AI总结通过三个医学影像基准校准理论预测的噪声标签交叉点，提出基于少量金标标签的决策规则。

Comments 5 pages, 2 figures, 4 tables

详情

AI中文摘要

经典的噪声标签理论预测，弱监督下的下游性能上限是标注者的准确率，这意味着一个尖锐的交叉点：一旦金标训练的分类器达到标注者的水平，弱标签就会从帮助变为伤害。该预测是理论性的；缺少的是将其转化为现代基础模型标注者的实例级陈述的基准校准。我们针对BiomedCLIP生成的弱标签，在三个医学影像基准（PCAM、ISIC、NIH-CXR）和六个跨越11倍参数范围的下游架构上提供了这样的校准。理论预测的交叉点出现在PCAM上约100个样本，ISIC上20-50个，NIH-CXR上250-500个；交叉点以上的弱标签使AUC降低高达-0.10。对于五个预训练架构中的四个，交叉点位置与架构无关，而一个家族内的DenseNet扫描（2.5倍参数，相同预训练）支持了标注者（而非学生）是主要约束的观点。该校准进而产生一个可在10-20个金标标签下操作的决策规则：比较仅金标AUC与用户金标集上的VLM准确率。NIH-CXR上的结构化与随机噪声符号翻转表明，该界限的仅速率形式是不完整的，并确定了一个具体的改进（标签空间投影），未来的基准可以设计来测试它。

英文摘要

Classical noisy-label theory predicts that downstream performance under weak supervision is bounded above by the labeler's accuracy, implying a sharp crossover: once a gold-trained classifier matches the labeler, weak labels stop helping and start hurting. The prediction is theoretical; what is missing is a benchmark calibration that turns it into an instance-level statement for modern foundation-model labelers. We provide such a calibration for BiomedCLIP-generated weak labels on three medical-imaging benchmarks (PCAM, ISIC, NIH-CXR) and six downstream architectures spanning an 11x parameter range. The crossover predicted by theory appears at ng~100 on PCAM, 20-50 on ISIC, and 250-500 on NIH-CXR; weak labels above the crossover degrade AUC by up to -0.10. The location is architecture-invariant for four of five pretrained architectures, and a within-family DenseNet sweep (2.5x parameters, identical pretraining) supports the view that the labeler, not the student, is the dominant constraint. The calibration in turn produces a decision rule operable from 10-20 gold labels: compare gold-only AUC to VLM accuracy on the user's gold set. A structured-vs-random noise sign flip on NIH-CXR shows that the rate-only formulation of the bound is incomplete and identifies a concrete refinement (label-space projection) that future benchmarks can be designed to test.

URL PDF HTML ☆

赞 0 踩 0

2605.24770 2026-05-26 cs.LG cs.CV

Muon in Vision Transformers: Optimizer-Recipe Interactions and Gradient Spectra

Muon在视觉Transformer中的应用：优化器-数据增强交互与梯度谱

Ben S. Southworth, Shuai Jiang, Daniel McBride, Eric C. Cyr, Stephen Thomas

发表机构 * Los Alamos National Laboratories（洛斯阿拉莫斯国家实验室）； Sandia National Laboratories（桑迪亚国家实验室）； Lehigh University（莱斯大学）

AI总结研究Muon优化器在视觉Transformer训练中的表现，发现其优于AdamW，且增益依赖于数据增强，通过梯度奇异值分析揭示Muon与AdamW在注意力投影和深层前馈块中的谱差异。

Comments 25 pages, 15 figures

详情

AI中文摘要

Muon是一种最近开发的矩阵感知优化器，在Transformer训练中表现出色，但其在视觉Transformer（ViT）中的行为尚不明确。我们研究Muon在ViT训练中的应用，主要在ImageNet-100和Pl@ntNet-300K上，与AdamW在涉及mixup、cutmix、平滑以及随机增强和擦除的标准视觉方案下进行比较。Muon始终优于AdamW，在长尾Pl@ntNet宏观top-1上尤其显著。这些增益也依赖于数据增强方案，Muon从高级和显著的数据增强技术中获益远大于AdamW。为了理解这种交互，我们分析了整个ViT中矩阵梯度的奇异值结构。在Muon训练中，去除重度数据增强会导致训练后期梯度矩阵的谱集中和模式坍塌，主要发生在深层MLP-down块中。在固定的“完整”增强方案下，Muon与AdamW最明显的对比出现在QKV梯度中，其中AdamW梯度能量集中在更窄的基上，而Muon将能量分散到更多的奇异模式上。因此，ViT中的Muon最好理解为一种优化器-数据增强交互。在固定方案下，Muon与AdamW最明显的区别在于注意力投影，其梯度由更宽的谱基组成。在Muon内部，完整的训练方案对于防止深层前馈块中的后期谱集中和模式坍塌很重要。我们进一步展示了在图像分割和掩码自编码器模型上训练ViT的效果，Muon在所有考虑的设置中均优于AdamW。

英文摘要

Muon is a recently developed matrix-aware optimizer that has shown strong results in transformer training, but its behavior in vision transformers (ViTs) is not yet well understood. We study Muon for ViT training, largely on ImageNet-100 and Pl@ntNet-300K, comparing against AdamW under standard vision recipes involving mixup, cutmix, smoothing, and random augmentation and erasing. Muon consistently outperforms AdamW, with especially large gains on long-tailed Pl@ntNet macro top-1. These gains are also recipe-dependent, where Muon benefits much more than AdamW from advanced and significant data augmentation techniques. To understand this interaction, we analyze the singular-value structure of matrix gradients throughout the ViT. Within Muon training runs, removing heavy data augmentation induces a late-training spectral concentration and mode collapse in gradient matrices, primarily in deep MLP-down blocks. Under a fixed "full" augmentation recipe, the clearest Muon-AdamW contrast appears instead in QKV gradients, where AdamW gradient energy remains concentrated in a much narrower basis while Muon spreads energy across substantially more singular modes. Muon in ViTs is therefore best understood as an optimizer-recipe interaction. Under a fixed recipe, Muon differs from AdamW most clearly in attention projections, where its gradients consist of a broader spectral basis. Within Muon, a full training recipe is important for preventing late spectral concentration and mode collapse in deep feedforward blocks. We further demonstrate efficacy in training ViTs on image segmentation and masked autoencoder models, where Muon outperforms AdamW in all settings considered.

URL PDF HTML ☆

赞 0 踩 0

2605.24769 2026-05-26 cs.CV cs.AI eess.IV

Leveraging pretrained RGB denoisers for hyperspectral image restoration

利用预训练RGB去噪器进行高光谱图像恢复

Daniele Picone, Mohamad Jouni, Mauro Dalla-Mura

发表机构 * Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-Lab（格勒诺布尔阿尔卑斯大学、法国国家科学研究中心、格勒诺布尔INP、GIPSA实验室）

AI总结提出一种轻量级适配器，通过投影映射重用冻结的预训练RGB去噪器，实现高光谱图像的去噪、去模糊和超分辨率恢复，实验表明RGB先验具有良好的迁移性。

2605.24767 2026-05-26 cs.RO

Enhanced INS/GNSS State Estimation using GNSS-Based Acceleration Measurements

增强的INS/GNSS状态估计：利用基于GNSS的加速度测量

Gal Versano, Itzik Klein

发表机构 * Autonomous Navigation and Sensor Fusion Lab（自主导航与传感器融合实验室）； Hatter Department of Marine Technologies（海洋技术系）； Charney School of Marine Sciences（海洋科学学院）； University of Haifa（海法大学）

AI总结提出利用历史GNSS测量和运动模型提取车辆加速度信息，并集成到INS/GNSS滤波器中以提高定位鲁棒性和精度，在两组真实无人地面车辆数据集上分别实现11.40%和20.74%的平均位置均方根误差改进。

2605.24763 2026-05-26 cs.LG physics.flu-dyn

High-fidelity Modeling of Full-scale Pressurized Water Reactor Flow Fields for Machine Learning Applications

面向机器学习应用的全尺寸压水堆流场高保真建模

Logan A. Burnett, Hyungjun Kim, Hsien-Cheng Chou, Arsha Witoelar, Robert A. Brewster, Benoit Forget, Emilio Baglietto, Majdi I. Radaideh

发表机构 * Department of Nuclear Engineering and Radiological Sciences, University of Michigan（密歇根大学核工程与辐射科学系）； Department of Nuclear Science and Engineering, Massachusetts Institute of Technology（麻省理工学院核科学与工程系）； Korea Atomic Energy Research Institute（韩国原子能研究所）； Department of Mechanical Engineering, University of Michigan（密歇根大学机械工程系）； Department of Computer Science and Engineering, University of Michigan（密歇根大学计算机科学与工程系）

AI总结本研究利用高保真CFD模拟和机器学习模型，对四环路压水堆组件级流场进行表征，揭示了冷腿旋流和下腔室输运导致的入口流量分布不均匀性，并验证了ConvLSTM等空间感知架构在流场重建与预测中的优越性。

Comments 30 pages, 10 figures, and 6 Tables

详情

AI中文摘要

本工作提出了一个用于四环路压水堆组件级流动表征的高保真计算流体动力学和数据驱动建模框架。利用公开可用的几何和运行条件构建了完整的下腔室和堆芯入口域，实现了带有泵诱导旋流边界条件的瞬态模拟。结果表明，冷腿旋流和下腔室输运在堆芯下部区域产生强烈的非均匀组件级入口流量分布，而轴向阻力和混合作用逐渐使更高位置的流动均匀化。这些基于物理的数据集随后被用于评估机器学习在部分场重建和短期自回归预测中的应用。一个基于3D卷积的修复模型成功地从部分观测中重建了缺失的组件级质量流量，误差集中在高湍流底部层，并在上层显著减小。跨多个ML模型的比较分析表明，空间感知架构，特别是ConvLSTM，通过有效捕捉耦合的时空动态，显著优于基于序列的LSTM和算子学习DeepONet方法。研究还强调了关键挑战，包括入口流预测对湍流和网格分辨率的敏感性，以及缺乏全尺寸实验验证数据。尽管存在这些限制，结果仍与预期的物理行为一致。总体而言，本工作将高保真CFD确立为开发数据驱动代理模型、稀疏传感策略和未来多物理场耦合框架的关键基础。

英文摘要

This work presents a high-fidelity computational fluid dynamics (CFD) and data-driven modeling framework for assembly-level flow characterization in a four-loop pressurized water reactor (PWR). A full lower-plenum and core-inlet domain was constructed using publicly available geometry and operating conditions, enabling transient simulations with pump-induced swirl boundary conditions. The results show that cold-leg swirl and lower-plenum transport generate strongly heterogeneous assembly-wise inlet flow distributions, particularly near the lower core region, while axial resistance and mixing progressively homogenize the flow at higher elevations. These physics-informed datasets were subsequently used to evaluate machine learning (ML) applications for partial field reconstruction and short-term autoregressive prediction. A 3D convolutional-based inpainting model successfully recon-structed missing assembly-level mass flow rates from partial observations, with errors concentrated in the highly turbulent base (bottom) layer and diminishing significantly in upper layers. Comparative analysis across multiple ML models demon-strates that spatially aware architectures, particularly ConvLSTM, significantly outperform sequence-based (LSTM) and operator-learning (DeepONet) approaches by effectively capturing coupled spatio-temporal dynamics. The study also high-lights key challenges, including the sensitivity of inlet flow predictions to turbulence and mesh resolution, as well as the absence of full-scale experimental validation data. Despite these limitations, the results remain consistent with expected physical behavior. Overall, this work establishes high-fidelity CFD as a critical foundation for developing data-driven surrogates, sparse sensing strategies, and future multiphysics coupling frameworks.

URL PDF HTML ☆

赞 0 踩 0

2605.24762 2026-05-26 cs.CV

4KLSDB: A Large-Scale Dataset for 4K Image Restoration and Generation

4KLSDB：用于4K图像恢复与生成的大规模数据集

Zihao Zhu, Kuan-Ru Huang, Zhaoming Xu, Renjie Li, Bo Wu, Ruizheng Bai, Mingyang Wu, Sayak Paul, Zhengzhong Tu

发表机构 * Texas A&M University（德克萨斯A&M大学）； Hugging Face

AI总结为解决现有数据集缺乏原生4K分辨率和规模的问题，提出包含129,484张4K图像的大规模数据集4KLSDB，并通过多阶段自动过滤和标注确保质量，实验证明其在超分辨率和扩散模型训练中能显著提升4K基准性能。

Comments Accepted to the DataCV Workshop at CVPR 2026; 10 pages, 4 figures, 7 tables; Our project page is available at: https://4klsdb.github.io/

详情

AI中文摘要

高分辨率数据集对于推进超分辨率（SR）和文本到图像（T2I）扩散研究至关重要。然而，当前公开可用的数据集既缺乏原生4K分辨率，也缺乏训练最先进模型所需的大规模。为解决这一差距，我们引入了一个4K大规模数据集与基准（4KLSDB），这是一个大规模、多样化的数据集，包含129,484张精心策划的4K分辨率图像，涵盖自然、城市景观、人物、食物、艺术品和CGI等多个类别，以及分别包含2,000和1,984张图像的独立验证集和测试集。图像来源于已建立的开放数据集，包括Photo Concept Bucket、Laion2B和PD12M。4KLSDB经历了严格的多阶段自动过滤和标注流程，涉及人工标注员和大规模多模态模型（LMMs），以确保高美学质量和数据集一致性。我们通过训练代表性的超分辨率和扩散模型来证明4KLSDB的有效性，观察到在原生4K基准上性能的显著提升。综合实验表明，在真实4K分辨率数据上训练与图像恢复任务中保真度的提高之间存在正相关，尤其是在4K分辨率下。我们通过提供4KLSDB，为研究社区提供宝贵资源，以推动真正高保真图像合成与恢复的进展。我们的项目页面位于：https://4klsdb.github.io/。

英文摘要

High-resolution datasets are essential for advancing super-resolution (SR) and text-to-image (T2I) diffusion research. However, current publicly available datasets lack both the native 4K resolution and the extensive scale necessary for training state-of-the-art models. To address this gap, we introduce a 4K Large Scale Dataset and Benchmark (4KLSDB), a large-scale, diverse dataset consisting of 129,484 carefully curated 4K resolution images spanning multiple categories such as nature, urban scenes, people, food, artwork, and CGI, alongside distinct validation and test sets containing 2,000 and 1,984 images respectively. Images were sourced from established open datasets including Photo Concept Bucket, Laion2B, and PD12M. 4KLSDB underwent rigorous multi-stage automated filtering and annotation pipelines involving both human annotators and Large Multimodal Models (LMMs) to ensure high aesthetic quality and dataset consistency. We demonstrate 4KLSDB's effectiveness by training representative super-resolution and diffusion models, observing significant improvements in performance on native 4K benchmarks. Comprehensive experiments illustrate a positive correlation between training on true 4K resolution data and improved fidelity in image restoration task, especially on 4K resolution. We provide the research community a valuable resource to drive progress toward genuinely high-fidelity image synthesis and restoration by providing 4KLSDB. Our project page is available at: https://4klsdb.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2605.24761 2026-05-26 cs.CV cs.RO

Drift-Resistant Navigation World Model with Anchored Epipolar Guidance

抗漂移导航世界模型与锚定对极引导

Po-Chien Luan, Zimin Xia, Wuyang Li, Yang Gao, Alexandre Alahi

发表机构 * EPFL（瑞士联邦理工学院）

AI总结提出一种抗漂移导航世界模型，通过锚定引导滚动和双向对极几何约束，同时减轻感知漂移和几何漂移，提升长期视觉质量、几何一致性和多视图连贯性。

详情

AI中文摘要

我们提出抗漂移导航世界模型，这是一种生成模型，可减轻传统基于滚动的导航世界模型中的感知漂移和几何漂移。现有方法递归地将生成内容馈送到后续步骤，导致噪声累积和预测退化，即感知漂移。同时，它们的预测通常偏离智能体的运动，导致几何漂移。我们通过将世界模型预测重新设计为锚定引导滚动来解决这两种漂移。我们不顺序滚动每一帧，而是首先预测稀疏的未来锚点，作为稳定的长期目标，然后生成每个块内的中间帧，这些帧以过去上下文和未来锚点为条件。重要的是，这些稀疏锚点还提供几何约束，由双向对极几何支持，以定位中间帧中相应内容应出现的位置。在四个基准上的实验表明，在长期视觉质量、几何一致性和多视图连贯性方面，相对于强基线有一致的改进。这些提升进一步转化为相同规划器下下游规划性能的提高，突显了抗漂移、几何感知预测对于可靠导航世界模型的重要性。

英文摘要

We propose Drift-Resistant Navigation World Model, a generative model that mitigates both perceptual drift and geometric drift in conventional rollout-based navigation world models. Existing methods recursively feed generated content into subsequent steps, causing noise accumulation and degraded predictions, i.e., perceptual drift. Meanwhile, their predictions often deviate from the agent's motion, resulting in geometry drift. We address both types of drift by redesigning world-model prediction as an anchor-guided rollout. Instead of rolling out every frame sequentially, we first predict sparse future anchors that serve as stable long-range targets, and then generate intermediate frames within each chunk conditioned on both past context and future anchors. Importantly, these sparse anchors also provide geometric constraints, supported by bidirectional epipolar geometry, to localize where corresponding content should appear in the intermediate frames. Experiments on four benchmarks demonstrate consistent improvements over strong baselines in long-horizon visual quality, geometric consistency, and multi-view coherence. These gains further translate into improved downstream planning performance under the same planners, highlighting the importance of drift-resistant, geometry-aware prediction for reliable navigation world models.

URL PDF HTML ☆

赞 0 踩 0

2605.24760 2026-05-26 cs.RO

Geometric Workspace Analysis and Transmission-Aware Dynamics of a Serial Spherical Tool for Microsurgery

显微外科用串行球形工具的几何工作空间分析与传动感知动力学

Anestis Mablekos-Alexiou, Lyndon da Cruz, Christos Bergeles

发表机构 * Moorfields Eye Hospital NHS Foundation Trust（莫尔菲兹眼科医院 NHS 基础信托）； King’s College London（国王学院伦敦）

AI总结提出一种用于显微外科的串行球形机构（带额外平移自由度）的运动学与传动感知设计框架，通过解析工作空间公式和传动感知动力学方法实现快速设计评估。

2605.24759 2026-05-26 cs.LG

A Contractive Feedback Semantics for Reinforcement Learning

强化学习的收缩反馈语义

Zuyuan Zhang

发表机构 * The George Washington University（乔治华盛顿大学）

AI总结本文通过将单步决策过程视为开放随机组件，并利用收缩反馈环实现无限时域策略评估，建立了强化学习的组合语义，并推导出近似等价、状态抽象和合约规范的理论结果。

详情

AI中文摘要

折扣强化学习通常通过闭马尔可夫决策过程上的贝尔曼方程来呈现。本文发展了一种组合视角：将单步决策过程视为开放随机组件，并通过闭合收缩反馈环实现无限时域策略评估。由此产生的语义为开放组件分配了类型化的贝尔曼变换器，将串联和并联布线解释为变换器的复合和张量，并将反馈解释为由唯一不动点实现的可容许有界守护迹。这一视角产生了三个理论结果。第一，近似组件等价是对于可容许的良类型守护单孔上下文的上下文同余：局部算子误差在将组件插入周围电路后仍受控，该电路使用该孔一次且其反馈节点具有认证的均匀守护性。第二，精确和近似状态抽象成为交换或近交换的余代数图，从而给出值保持和显式 sup-norm 失真界。第三，在单调 ω-连续合约变换器语义下，安全性、风险和资源规范可以表示为量值值合约，其中局部归纳界通过最小不动点推理提升到布线和反馈中。其核心主张并非所有强化学习态射构成全局迹幺半范畴，而是折扣贝尔曼评估在守护电路的可容许类上允许收缩反馈语义。

英文摘要

Discounted reinforcement learning is usually presented through Bellman equations on closed Markov decision processes. This paper develops a compositional view: a one-step decision process is treated as an open stochastic component, and infinite-horizon policy evaluation is obtained by closing a contractive feedback loop. The resulting semantics assigns typed Bellman transformers to open components, interprets series and parallel wiring as composition and tensoring of transformers, and interprets feedback as an admissible guarded Banach trace realized by a unique fixed point. This perspective yields three theoretical consequences. First, approximate component equivalence is a contextual congruence for admitted well-typed guarded one-hole contexts: local operator error remains controlled after plugging the component into a surrounding circuit that uses the hole once and whose feedback nodes have certified uniform guardedness. Second, exact and approximate state abstractions become commuting or near-commuting coalgebraic diagrams, giving value-preservation and explicit sup-norm distortion bounds. Third, under monotone $ω$-continuous contract-transformer semantics, safety, risk, and resource specifications can be represented as quantale-valued contracts, where local inductive bounds lift through wiring and feedback by least-fixed-point reasoning. Its central claim is not that all RL morphisms form a global traced monoidal category, but that discounted Bellman evaluation admits a contractive feedback semantics on the admissible class of guarded circuits.

URL PDF HTML ☆

赞 0 踩 0

2605.24756 2026-05-26 cs.AI

Proper Scoring Rules for Agentic Uncertainty Quantification

智能体不确定性量化的适当评分规则

Suresh Raghu, Satwik Pandey, Shashwat Pandey

发表机构 * Independent Researcher（独立研究者）

AI总结针对语言模型智能体轨迹中的不确定性信号，提出严格适当的轨迹评分规则TPS，用于评估逐步骤成功概率过程，并处理删失数据。

Comments 38 pages, 2 figures

详情

AI中文摘要

语言模型智能体在轨迹中越来越多地发出不确定性信号，但现有的智能体不确定性量化评估常常混淆排序有用性与概率真实性。AUROC、AUPRC、风险覆盖、轨迹ECE和标量化轨迹评分评估了区分度、分箱校准或压缩摘要，但并未严格引出完整的基于前缀的条件成功概率轨迹$q_t = P^π(Y=1 | H_t)$。基于序列适当评分，我们引入了轨迹适当评分（TPS），这是一个预测器无关的严格适当的轨迹级评分规则族，适用于任何校准为最终成功概率的逐步骤不确定性信号。我们证明，在完全观测下，TPS在所选的评分族和权重方案内严格引出了成功概率过程。我们将构造扩展到行政删失轨迹，通过将完整数据评分投影到可观测的停止前缀上，得到精确的$q_Z$加权简化评分，并在$q_Z$未估计时得到可处理的近似。我们进一步表明，常见的轨迹评估器针对的是比完整前缀条件概率过程更弱的目标：轨迹ECE是分辨率盲的，而标量化轨迹Brier仅引出压缩标量，而非完整轨迹。在StrategyQA、Tau2-Bench、HotpotQA和WebShop上的实验表明，这些理论差异在操作上是可见的：概率重新校准可以显著改变TPS，而几乎不改变排序指标，并且可处理的删失近似相对于仅完整评估可能改变结论。

英文摘要

Language-model agents increasingly emit uncertainty signals throughout a trajectory, but existing agentic UQ evaluations often conflate ranking usefulness with probabilistic truthfulness. AUROC, AUPRC, risk-coverage, Trajectory ECE, and scalarized trajectory scores evaluate discrimination, binwise calibration, or collapsed summaries, but do not strictly elicit the full prefix-conditioned success-probability trace $q_t = P^π(Y=1 | H_t)$. Building on prequential proper scoring, we introduce the Trajectory Proper Score (TPS), a predictor-agnostic family of strictly proper trajectory-level scoring rules for any per-step uncertainty signal calibrated into a probability of eventual success. We prove that TPS strictly elicits the success-probability process under complete observation, within the chosen score family and weight schedule. We extend the construction to administratively censored trajectories by projecting the complete-data score onto the observable stopped prefix, yielding an exact $q_Z$-weighted reduced score and a tractable approximation when $q_Z$ is unestimated. We further show that common trajectory evaluators target weaker objects than the full prefix-conditioned probability process: Trajectory ECE is resolution-blind, while scalarized Trajectory Brier elicits only the collapsed scalar, not the full trace. Experiments on StrategyQA, Tau2-Bench, HotpotQA, and WebShop show that these theoretical distinctions are operationally visible: probability recalibration can substantially change TPS while leaving rank metrics nearly unchanged, and the tractable censored approximation can change the verdict relative to complete-only evaluation.

URL PDF HTML ☆

赞 0 踩 0

2605.24755 2026-05-26 cs.AI cs.CL

Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models

使用多智能体语言模型自动检测和分类自然音频日记中的妄想相关内容

Feng Chen, Justin Tauscher, Changye Li, Meliha Yetisgen, Alex Cohen, Adam Kuczynski, Angelina Pei-Tzu Tsai, Benjamin Buck, Dror Ben-Zeev, Trevor Cohen

发表机构 * Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA（生物医学信息学与医学教育系，华盛顿大学，西雅图，华盛顿州，美国）； Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA（精神病学与行为科学系，华盛顿大学，西雅图，华盛顿州，美国）； Department of Psychology, Louisiana State University, Baton Rouge, LA, USA（心理学系，路易斯安那州立大学，巴吞鲁日，路易斯安那州，美国）； Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA（精神病学系，北卡罗来纳大学教堂山分校，教堂山，北卡罗来纳州，美国）

AI总结提出一种多智能体LLM流水线，从自然音频日记中自动检测和分类妄想信念、情感和行为反应，通过多数投票实现稳健性能。

Comments Accepted by CLPych 2026

详情

AI中文摘要

在自然环境中录制的言语独白为表征精神疾病现象学和检测症状恶化提供了机会。大型语言模型（LLM）为自动化这一过程提供了新的可能性，因为它们主要需要标注数据进行评估而非训练。在本文中，我们提出了一种新颖的自动化多智能体LLM流水线，用于从具有中度被害妄想的人的音频日记转录中，进行细粒度、多标签的提取，以识别暗示妄想信念、相关情感反应和行为反应的语言。通过评估三个基础模型的集成，我们证明详细的诊断提示指令成功减少了妄想主题分类的假阳性，但也限制了情感或行为反应的解读。此外，比较多智能体裁决框架表明，智能体之间的复杂对话辩论通过诱导过早共识降低了临床模糊文本的准确性。相反，多数投票建立了稳健的性能（妄想检测和分类的Micro F1分别为0.872和0.779）。这项工作为自动检测和表征自然言语中暗示妄想信念的内容提供了一个经过验证且可扩展的流水线。

英文摘要

Speech monologues recorded in naturalistic settings provide opportunities to characterize mental illness phenomenology and detect symptom exacerbation. Large language models (LLMs) offer new possibilities for automating this process, as they require annotated data primarily for evaluation rather than training. In this paper, we present a novel automated, multi-agent LLM pipeline for the fine-grained, multi-label extraction of language suggestive of delusional beliefs, associated affective responses, and behavioral responses from transcripts of naturalistic audio diaries collected from people with moderate persecutory ideation. Evaluating an ensemble of three foundation models, we demonstrate that detailed diagnostic prompt instructions successfully reduce false positives for delusional theme classification, but also constrain the interpretation of affective or behavioral responses. Furthermore, comparing multi-agent adjudication frameworks shows that complex conversational debate between agents diminishes accuracy on clinically ambiguous text by inducing premature consensus. Instead, majority voting establishes robust performance (Micro F1 of 0.872 and 0.779 for delusion detection and classification respectively). This work provides a validated and scalable pipeline for the automated detection and characterization of content suggesting delusional beliefs in naturalistic speech.

URL PDF HTML ☆

赞 0 踩 0

2605.24754 2026-05-26 cs.CV cs.AI cs.LG

Motion-Compensated Weight Compression

运动补偿权重压缩

Ismail Lamaakal

发表机构 * Multidisciplinary Faculty of Nador Mohammed Premier University（纳多莫哈梅德 premier 大学多学科学院）

AI总结提出运动补偿权重压缩（MCWC）方法，通过对齐置换对称块并利用层序预测和熵编码，有效压缩神经网络权重，在Transformer语言建模和视觉分类任务中提升率-精度帕累托前沿。

Comments 54 pages, 17 tables, 6 Figures

详情

AI中文摘要

神经网络权重日益成为部署的瓶颈，然而大多数压缩流水线独立处理各层，忽略了由函数保持对称性引起的跨层冗余。我们提出运动补偿权重压缩（MCWC），一种仅权重的编解码器，它对齐置换对称块（例如隐藏单元和注意力头）以最大化跨层对应，将深度转化为可预测序列。在对齐的坐标系中，MCWC使用带有周期性关键帧的轻量级层序预测器，并仅编码在率失真目标下训练的学习熵模型预测残差。一个简单的解码器通过熵解码、反量化、预测驱动重建和逆对齐来重建可部署的权重，从而实现快速权重物化以进行推理。在Transformer语言建模和视觉分类中，MCWC在强量化和学习权重编解码基线之上改善了率-精度帕累托前沿，同时保持有竞争力的解码时间。消融实验证实，对齐、预测、熵建模和关键帧调度对于获得全部增益都是必要的。我们的代码可通过 https://github.com/Ism-ail11/MCWC 获取。

英文摘要

Neural network weights are increasingly a bottleneck for deployment, yet most compression pipelines treat layers independently and overlook cross-layer redundancy induced by function-preserving symmetries. We propose Motion-Compensated Weight Compression (MCWC), a weight-only codec that aligns permutation-symmetric blocks (e.g., hidden units and attention heads) to maximize cross-layer correspondence, turning depth into a predictable sequence. In the aligned coordinate system, MCWC uses a lightweight layer-sequential predictor with periodic keyframes and encodes only quantized prediction residuals using a learned entropy model trained under a rate distortion objective. A simple decoder reconstructs deployable weights by entropy decoding, dequantization, predictor-driven reconstruction, and inverse alignment, enabling fast weight materialization for inference. Across Transformer language modeling and vision classification, MCWC improves the rate accuracy Pareto frontier over strong quantization and learned weight-codec baselines, while maintaining competitive decode time. Ablations confirm that alignment, prediction, entropy modeling, and keyframe scheduling are each necessary for the full gains. Our code is available via https://github.com/Ism-ail11/MCWC.

URL PDF HTML ☆

赞 0 踩 0

2605.24753 2026-05-26 cs.CV

Ghosts in the Point Clouds: De-glaring LiDAR in the Transient Domain

点云中的鬼影：瞬态域中的LiDAR去眩光

Avery Gump, Connor Henley, Sungjin Cheong, Akarsh Prabhakara, Mohit Gupta

发表机构 * University of Wisconsin–Madison（威斯康星大学麦迪逊分校）

AI总结针对固态LiDAR内部多径眩光导致的伪影问题，提出基于瞬态眩光扩散函数（TGSF）的物理模型和无训练算法，在点云形成前抑制眩光，保留真实场景结构。

Comments CVPR 2026

详情

AI中文摘要

现代LiDAR正迅速从笨重的机械扫描系统过渡到超紧凑、低成本、固态阵列。这种微型化在实现可扩展性、经济性和类似相机的数据结构的同时，引入了一种新的严重故障模式：内部多径眩光。当来自明亮或高反射表面的光在LiDAR内部反射和散射时，本应到达单个像素的光会扩散到像素阵列上。由此产生的伪影会创建幻影物体、遮挡真实物体，并产生安全关键的“点云中的鬼影”。本文介绍了一种基于物理的传感模型和算法技术来解决这一效应。我们表明，内部眩光可以表示为作用于瞬态测量的线性、场景无关算子——瞬态眩光扩散函数（TGSF）。基于此模型，我们开发了一种无训练方法，在点云形成之前对低级LiDAR检测（或回波）进行操作，利用眩光扩散函数的知识来推理每个检测来自眩光的可能性。该方法与现有LiDAR信号处理流水线兼容，可在未经修改的商业传感器上部署。通过使用真实单光子LiDAR硬件的实验，我们证明了在保留真实场景结构的同时，显著抑制了严重眩光伪影。

英文摘要

Modern LiDARs are rapidly transitioning from bulky, mechanically scanned systems to ultra-compact, low-cost, solid-state arrays. This miniaturization-while enabling scalability, affordability, and camera-like data structures-introduces a new and severe failure mode: internal-multipath glare. When light from a bright or retroreflective surface reflects and scatters within the LiDAR, light that should reach a single pixel spreads across the pixel array. The resulting artifacts create phantom objects, obscure real ones, and produce safety-critical "ghosts in the point clouds." This paper introduces a physically grounded sensing model and algorithmic techniques for addressing this effect. We show that internal glare can be represented as a linear, scene-independent operator-the Transient Glare Spread Function (TGSF)-acting on the transient measurements. Building on this model, we develop a training-free approach that operates on low-level LiDAR detections (or echoes) prior to point-cloud formation, leveraging knowledge of the glare spread function to reason about the likelihood of each detection arising from glare. The resulting approach is compatible with existing LiDAR signal-processing pipelines, and deployable on unmodified commercial sensors. Using experiments with real single-photon LiDAR hardware, we demonstrate substantial suppression of severe glare artifacts while preserving true scene structure.

URL PDF HTML ☆

赞 0 踩 0

2605.24752 2026-05-26 cs.LG cs.CC cs.DS math.PR

A computational phase transition for learning-to-sample from Ising models

从Ising模型中学习采样的计算相变

Andrej Risteski, Thuy-Duong Vuong

发表机构 * Machine Learning Department, Carnegie Mellon University（卡内基梅隆大学机器学习系）； Department of Computer Science and Engineering, UC San Diego（加州大学圣地亚哥分校计算机科学与工程系）

AI总结本研究构造了谱阈值以上的有界宽度Ising模型族，证明在标准密码学假设下学习采样是计算困难的，从而在谱阈值处建立了尖锐的计算相变。

详情

AI中文摘要

我们研究从Ising模型中\emph{学习采样}——这是生成模型背后的基本算法任务，Ising模型是理论计算机科学和机器学习中算法思想的标准测试平台。给定未知目标分布的独立同分布样本，学习采样的目标是学习一个计算高效的生成过程，产生近似相同分布的新样本。我们构造了一个常界宽度的Ising模型族，该族恰好位于谱阈值$λ_{\max}(J)-λ_{\min}(J)=1$之上，并表明在标准密码学假设下，即使学习者获得模型的多项式多个独立同分布样本以及对其参数的显式访问，对该族的学习采样在计算上也是困难的。结合[AJKPV24,KLV25]的结果（表明谱阈值以下学习采样是可处理的），这建立了在谱阈值处的一个尖锐计算相变。此外，结合先前关于有界宽度Ising模型参数学习的结果[KM17,WSD19,VML20]，这表明学习采样可能比参数学习更困难。最后，我们表明，对于这些困难实例，任何高效的学习者都表现出一种自然的记忆-幻觉二分法：学习者要么输出经过简单变换后与（变换后的）训练数据匹配的配置，要么将大量质量放在目标分布下概率可忽略的配置上。

英文摘要

We study \emph{learning-to-sample} -- a basic algorithmic task underlying generative modeling -- for Ising models, a standard testbed for algorithmic ideas in both theoretical computer science and machine learning. Given i.i.d. samples of an unknown target distribution, the goal of learning-to-sample is to learn a computationally efficient generation procedure that produces new samples following approximately the same distribution. We construct a family of Ising models of constantly bounded-width which lie just beyond the spectral threshold $λ_{\max}(J)-λ_{\min}(J)=1$, and show that learning-to-sample for this family is computationally hard under standard cryptographic assumptions, even when the learner is given both polynomially many i.i.d. samples from the model and explicit access to its parameters. Combined with results of [AJKPV24,KLV25] showing tractability of learning-to-sample below the spectral threshold, this establishes a sharp computational phase transition at the spectral threshold. Moreover, combined with prior results on parameter learning for bounded-width Ising models [KM17,WSD19,VML20], this shows that learning-to-sample can be more difficult than parameter learning. Finally, we show that any efficient learner for these hard instances exhibits a natural memorization-hallucination dichotomy: the learner must either output configurations that, after a simple transformation, match the (transformed) training data or place substantial mass on configurations of negligible probability under the target distribution.

URL PDF HTML ☆

赞 0 踩 0

2605.24743 2026-05-26 cs.LG cs.AI

Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning

用于多轮LLM微调的合成轨迹的双层优化

Shresth Verma, Mauricio Tec, Cheol Woo Kim, Kai Wang, Milind Tambe

发表机构 * Harvard University（哈佛大学）； Georgia Institute of Technology（佐治亚理工学院）

AI总结提出BOOST双层优化框架，通过内层加权训练和外层轻量级重加权头学习，解决合成轨迹质量异质性导致的LLM多轮交互性能下降问题。

详情

AI中文摘要

虽然LLM在单轮生成中表现出色，但在长程多轮交互中表现不佳。离线强化学习提供了一种可扩展的方法，但其性能依赖于多轮轨迹数据的可用性和质量。一种常见的补救措施是使用LLM或模拟器生成的合成轨迹来增强训练，但合成数据的质量高度异质，天真地将所有轨迹视为同等信息量会降低性能。我们提出BOOST，一个双层优化框架，其中内层在重新加权的数据上训练LLM，外层在保留的真实验证任务上训练一个轻量级的重加权头，无需外部评判器即可分配连续的轨迹级权重。为了夯实这一方法，我们推导出一个PAC-Bayesian界，揭示了三方权衡：合成数据增加了多样性但存在任务偏移风险，而将权重集中在高质量轨迹上提高了经验性能但以有效样本量为代价。实验上，我们的方法一致优于多个基线。分析表明，它提高了与真实数据分布一致且具有更高定性价值的合成轨迹的权重。

英文摘要

While LLMs excel at single-turn generation, they struggle with long-horizon, multi-turn interactions. Offline reinforcement learning (RL) offers a scalable approach, yet its performance hinges on the availability and quality of multi-turn trajectory data. A common remedy is to augment training with synthetic trajectories generated by LLMs or simulators, but synthetic data is highly heterogeneous in quality, and naively treating all trajectories as equally informative can degrade performance. We propose BOOST, a bilevel optimization framework where the inner level trains the LLM on reweighted data and the outer level trains a lightweight reweighting head on held-out real validation tasks, assigning continuous trajectory-level weights without requiring an external judge. To ground this approach, we derive a PAC-Bayesian bound revealing a three-way trade-off: synthetic data increases diversity but risks task-shift, while concentrating weight on high-quality trajectories improves empirical performance at the cost of effective sample size. Empirically, our method consistently outperforms multiple baselines. Analysis reveals it upweights synthetic trajectories that align with the real data distribution and exhibit higher qualitative merit.

URL PDF HTML ☆

赞 0 踩 0

2605.24742 2026-05-26 cs.LG

Aligning Molecular Graph Explanations with Chemical Identity via InChIfied Invariants

通过InChIfied不变量将分子图解释与化学身份对齐

Emanuele Guidotti, Sara Puglioli

发表机构 * University of Lugano（卢加诺大学）； Philochem AG（Philochem公司）

AI总结提出基于InChI的节点、边和图特征（InChIfied Invariants），确保化学等价分子图具有一致表示，从而提升预测和解释的一致性。

详情

AI中文摘要

在分子图上进行机器学习时，获得一致的解释需要预测和归因与化学身份对齐。然而，同一分子的化学等价图示可能产生不同的分子表示，导致不一致的预测和解释。在这里，我们引入了InChIfied不变量，这是一类基于国际化学标识符（InChI）的节点、边和图特征，设计为在保持化学身份的变换下具有不变性。使用来自PubChem Substances的一百万个分子图，我们表明InChIfied不变量在99.62%的情况下为化学等价图生成相同的表示，而标准的Daylight不变量仅在0.35%的情况下如此。在MoleculeNet任务中，InChIfied不变量在保持预测性能的同时，显著提高了同一分子不同图描绘之间的预测一致性。我们进一步进行了定量归因分析，并表明使用标准分子特征化方法产生的解释在化学等价图之间差异很大，而InChIfied不变量通过构造强制一致归因。我们发布了实现InChIfied不变量的开源软件，可作为标准分子图特征的即插即用替代品。

英文摘要

Obtaining consistent explanations for machine learning on molecular graphs requires predictions and attributions to be aligned with chemical identity. However, chemically equivalent drawings of the same molecule can induce different molecular representations, leading to inconsistent predictions and explanations. Here, we introduce InChIfied Invariants, a class of node, edge, and graph features based on the International Chemical Identifier (InChI) and designed to be invariant under transformations that preserve chemical identity. Using one million molecular graphs from PubChem Substances, we show that InChIfied Invariants produce identical representations for chemically equivalent graphs in 99.62% of cases, whereas standard Daylight invariants do so in only 0.35% of cases. Across MoleculeNet tasks, InChIfied Invariants preserve predictive performance while significantly improving prediction consistency across alternative graph depictions of the same molecules. We further perform a quantitative attribution analysis and show that explanations produced with standard molecular featurization methods vary substantially across chemically equivalent graphs, while InChIfied Invariants enforce consistent attributions by construction. We release open-source software implementing InChIfied Invariants, which can be used as a drop-in replacement for standard molecular graph features.

URL PDF HTML ☆

赞 0 踩 0

2605.24740 2026-05-26 cs.LG cs.GT

Reinforcement Learning for Reachability: Guaranteeing Asymptotic Optimality

可达性的强化学习：保证渐近最优性

Amogh Palasamudram, Jakub Svoboda, Suguman Bansal, Krishnendu Chatterjee

发表机构 * Institute of Science and Technology, Austria（奥地利科学与技术研究所）； Georgia Institute of Technology, USA（美国佐治亚理工学院）； Dartmouth College, USA（美国达特茅斯学院）

AI总结针对可达性规格的强化学习，提出一种基于PAC学习的迭代方法，在无需已知MDP内部参数的情况下实现渐近最优策略，并通过实验验证收敛动态。

Comments Main text and appendix of work accepted in ICML 2026

详情

AI中文摘要

强化学习（RL）在可达性规格中的应用是序列决策的基础，但理论保证仍较少探索。最近的工作实现了向最优策略的渐近收敛。然而，该方法对收敛动态的洞察有限。在这项工作中，我们提出了一种替代方法，提供了对收敛更深入的理论洞察。我们的方法基于带有假设的PAC学习。PAC学习保证在有限时间内以高置信度获得接近最优的策略，但需要知道内部MDP参数，如最小转移概率。我们认为，虽然这些参数在RL中是未知的，但它们可以迭代地细化并以递增的精度估计。通过迭代满足PAC条件，我们证明了在极限情况下可以实现精确最优性。在标准基准上的实证评估验证了我们对收敛动态的理论洞察。

英文摘要

Reinforcement learning (RL) for reachability specifications is fundamental in sequential decision-making, yet theoretical guarantees remain less explored. A recent work achieves asymptotic convergence to optimal policies. However, this approach provides limited insight into convergence dynamics. In this work, we present an alternative approach that provides deeper theoretical insights into convergence. Our approach builds on PAC learning with assumptions. PAC learning guarantees near-optimal policies with high confidence in finite time but requires knowing internal MDP parameters like minimum transition probability. We argue that while these parameters are unknown in RL, they can be iteratively refined and estimated with increasing accuracy. By iteratively satisfying PAC conditions, we show that exact optimality can be achieved in the limit. Empirical evaluations on standard benchmarks validate our theoretical insights into convergence dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.24737 2026-05-26 cs.CL cs.AI cs.CY

Who judges the judges? Governance from metrics: a runtime framework for continuous LLM compliance monitoring

谁来评判评判者？基于指标的治理：面向持续LLM合规监控的运行时框架

Jehanne Dussert

发表机构 * Independent Researcher（独立研究者）

AI总结针对AI合规作为审计时二元判定而非生产系统持续可测量属性的问题，提出基于指标的治理原则，并开发开源框架govllm，通过运行时可观测性信号实现持续合规监控，验证了多模型陪审团设计在监管评估中的有效性。

Comments 41 pages, 8 figures, preprint

详情

AI中文摘要

当前AI合规方法将合规性视为审计时的二元判定，而非生产系统的持续可测量属性。我们认为这种合规虚构在结构上不适合欧盟AI法案的要求，该法案要求持续的人类监督和检测部署系统中涌现的行为漂移。我们引入了基于指标的治理原则，即监管合规性是从运行时可观测性中推导出的持续信号，而非来自静态评估。基于这一原则，我们提出了govllm，一个开源框架，实现了治理驱动的路由架构，其中模型选择由累积的合规分数决定，而非仅由延迟或成本决定。我们方法的核心是一个监管评判者小组——针对每个标准（欧盟AI法案、GDPR、ANSSI、可访问性）专门化的LLM评估器——我们将评判者间的分歧重新定义为监管不确定性信号，而非噪声，需要人工仲裁。我们通过一个包含49个标注提示/响应对的地面真实语料库验证了该方法，涵盖五个监管标准，由四个完全本地运行的小型语言模型（SLM，1.7B-7B参数）评估。一致率从51.5%（mistral:7b）到69.1%（phi4-mini）不等，没有单一模型在所有标准上占主导地位——这从经验上激励了“档案即陪审团”的设计。我们进一步记录了小型监管评判者中的三种结构性失败模式，以及一种评判者特定的位置偏差，该偏差在三种问题顺序条件（原始、反转、排列）下使一致率降低多达25个百分点。govllm作为开源软件发布，以支持可复现的AI治理研究。

英文摘要

Current approaches to AI compliance treat conformity as a binary, audit-time verdict rather than a continuous, measurable property of production systems. We argue that this compliance fiction is structurally ill-suited to the requirements of the EU AI Act, which demands ongoing human oversight and the detection of emergent behavioural drift in deployed systems. We introduce governance from metrics, a principle whereby regulatory compliance is derived as a continuous signal from runtime observability rather than from static assessments. Building on this principle, we present govllm, an open-source framework implementing a governance-driven routing architecture in which model selection is determined by accumulated compliance scores rather than by latency or cost alone. Central to our approach is a panel of regulatory judges - LLM evaluators specialised per criterion (EU AI Act, GDPR, ANSSI, accessibility) - whose inter-judge disagreement we reframe not as noise but as a regulatory uncertainty signal warranting human arbitration. We validate this approach through a ground truth corpus of 49 annotated prompt/response pairs across five regulatory criteria, evaluated by four small language models (SLMs, 1.7B-7B parameters) running fully on-premise. Agreement rates range from 51.5% (mistral:7b) to 69.1% (phi4-mini), with no single model dominating across all criteria - empirically motivating the Profile-as-jury design. We further document three structural failure modes in small regulatory judges and a judge-specific position bias that degrades agreement by up to 25 percentage points across three question-order conditions (original, reversed, permuted). govllm is released as open-source software to support reproducible AI governance research.

URL PDF HTML ☆

赞 0 踩 0

2605.24733 2026-05-26 cs.CL

StepGap: A Hybrid NLI-LLM Checker for Step-Level Evidence-Gap Detectionin Multi-Hop Question Answering

StepGap：一种用于多跳问答中步骤级证据缺口检测的混合NLI-LLM检查器

Yuelyu Ji, Zhuochun Li, Hui Ji, Daqing He

发表机构 * School of Computing and Information, University of Pittsburgh（计算信息学院，匹兹堡大学）

AI总结提出混合NLI-LLM决策树StepGap，用于检测多跳问答中的步骤级证据缺口并输出三类标签，在82个问题上达到sF1=72.0，且作为GRPO过程奖励可提升模型精确匹配率。

详情

AI中文摘要

我们提出 extbf{StepGap}，一种混合NLI-LLM决策树，用于检测多跳问答中的步骤级证据缺口，并输出三类标签： extsc{矛盾声明}（CC）、 extsc{无关证据}（IE）或 extsc{缺失桥梁}（MB），每个标签对应具体的修复动作。在82个多跳问题（181个标注步骤，$κ{=}0.704$）上，StepGap达到sF1$=$72.0，处于纯LLM基线（70.1）的bootstrap置信区间内，但具有更可分解的结构：移除StepGap的每个阶段都会 extit{降低}F1，而四个纯LLM移除中有三个 extit{提高}F1——这是 extit{竞争性错误抵消}的迹象，即内部阶段相互掩盖错误。我们进一步揭示了 extit{Q-F1陷阱}：问题级F1被标记每一步的检查器机械地膨胀，使得步骤级F1成为必要的诊断指标。作为带类型的GRPO过程奖励，StepGap将Qwen2.5-7B-Instruct的精确匹配率从$32.1{\pm}0.3$提升至$35.4{\pm}0.9$（三个种子），单次运行比较显示，与匹配的Search-R1 GRPO复现相比，平均EM增益为$+5.6$。

英文摘要

We present \textbf{StepGap}, a hybrid NLI-LLM decision tree that detects step-level evidence gaps in multi-hop QA and emits one of three typed labels: \textsc{Contradicted Claim} (CC), \textsc{Irrelevant Evidence} (IE), or \textsc{Missing Bridge} (MB), each tied to a concrete repair action. On 82 multi-hop questions (181 annotated steps, $κ{=}0.704$), StepGap reaches sF1$=$72.0, within the bootstrap confidence interval of an LLM-only baseline (70.1) but with a more decomposable structure: every StepGap stage \emph{hurts} F1 when removed, while three of four LLM-only removals \emph{improve} F1 -- a sign of \emph{competing-error cancellation}, where internal stages mask each other's errors. We further expose a \emph{Q-F1 trap}: question-level F1 is mechanically inflated by checkers that flag every step, making step-level F1 the necessary diagnostic. Used as a typed GRPO process reward, StepGap improves Qwen2.5-7B-Instruct Exact Match from $32.1{\pm}0.3$ to $35.4{\pm}0.9$ across three seeds, with the single-run comparison showing a $+5.6$ Avg EM gain over the matched Search-R1 GRPO reproduction.

URL PDF HTML ☆

赞 0 踩 0

2605.24726 2026-05-26 cs.CV

From Full Boards to Tiny Defects: Scale-Aware Tile Inference with Topology-Aware Merging for High-Resolution PCB Defect Detection

从整板到微小缺陷：面向高分辨率PCB缺陷检测的尺度感知瓦片推理与拓扑感知合并

Mohammad Alijanpour Shalmani, Alale Rezvani Boroujeni, Ali Amini, Jiann Shiun Yuan

发表机构 * Dept. of Electrical and Computer Engineering（电气与计算机工程系）； Dept. of Marketing（市场营销系）； Centre of Real Time Computer Systems, Faculty of Informatics, Kaunas University of Technology（实时计算机系统中心，信息学院，凯纳斯技术大学）

AI总结针对高分辨率PCB图像缩放导致微小缺陷丢失的问题，提出基于瓦片推理的尺度一致训练策略和拓扑感知合并方法，无需重新训练即可显著提升缺陷检测精度。

详情

AI中文摘要

高分辨率印刷电路板（PCB）检测在将整板图像缩放到标准检测器输入时存在分辨率崩溃问题：微尺度缺陷缩小到几个像素而被遗漏。基于瓦片的推理保留了局部细节，但在瓦片边缘引入边界伪影，导致分割检测和假阴性。我们提出了五种推理策略的系统比较，在两个高分辨率PCB缺陷数据集PCB-Defect（230张图像，1704个标注）和HRIPCB（693张图像，2953个标注）上评估，涵盖六类缺陷。我们表明训练-推理尺度一致性至关重要：在全图像上训练的检测器在瓦片推理下mAP@50崩溃至0.01，而同一架构在640×640瓦片裁剪上训练时在两个数据集上分别达到0.72和0.94。我们进一步利用拓扑感知瓦片合并（TA-TM），一种无需训练的后处理方法，构建瓦片邻接图，并在全局NMS之前使用邻瓦片一致性调整边界敏感检测分数。在两个数据集中，添加128像素瓦片重叠将边界区域召回率从约26-63%提升至约70-100%，TA-TM在两个基准上均达到最佳mAP@50，且瓦片推理恢复了全图像方法完全遗漏的46-100%的小缺陷。结果在不同数据集上一致，证实了所提出策略的泛化性。TA-TM无需重新训练且架构无关，可直接应用于现有PCB检测流水线。

英文摘要

High-resolution printed circuit board (PCB) inspection suffers from resolution collapse when full-board images are resized to standard detector inputs: micro-scale defects shrink to a few pixels and are missed. Tile-based inference preserves local detail but introduces boundary artefacts at tile edges, causing split detections and false negatives. We present a systematic comparison of five inference strategies evaluated on two high-resolution PCB defect datasets, PCB-Defect (230 images, 1704 annotations) and HRIPCB (693 images, 2 953 annotations), spanning six defect classes. We show that training-inference scale consistency is critical: a detector trained on full images collapses to mAP@50 = 0.01 under tile inference, while the same architecture trained on 640*640 tile crops achieves 0.72 and 0.94 on the two datasets respectively. We further exploited Topology-Aware Tile Merging (TA-TM), a training-free post-processing method that builds a tile-adjacency graph and adjusts boundary-sensitive detection scores using neighbour-tile agreement before global NMS. Across both datasets, adding 128 px tile overlap raises boundary-zone recall from ~26-63% to ~70-100%, TA-TM achieves the best mAP@50 on both benchmarks, and tile inference recovers 46-100% of small defects missed entirely by full-image methods. Results are consistent across datasets, confirming the generalizability of the proposed strategy. TA-TM requires no retraining and is architecture-agnostic, making it directly applicable to existing PCB inspection pipelines.

URL PDF HTML ☆

赞 0 踩 0

2605.24722 2026-05-26 cs.CV

Calibrating Probabilistic Object Detectors with Annotator Disagreement

校准具有标注者分歧的概率目标检测器

Zhi Qin Tan, Owen Addison, Yunpeng Li

发表机构 * organization= Faculty of Dentistry, Oral \& Craniofacial Sciences, King's College London , city= London , country= United Kingdom

AI总结针对目标检测中因物体模糊性导致标注者分歧的问题，提出一种无需真实标注即可校准概率目标检测器的方法，通过设计分类和定位校准误差指标及训练时/事后校准器，使模型预测不确定性匹配标注分布。

详情

AI中文摘要

对于模糊物体（例如医学图像），标注者之间可能存在高度分歧，这凸显了在目标检测任务中建立真实标注的挑战。尽管如此，所有现有的目标检测器都隐式地需要访问真实标注以进行训练或评估。我们针对的基本问题是：如何利用多个标注者的标注（但缺乏因物体模糊性导致的客观真实标注）来学习目标检测器，以及如何使学习到的检测器在检测模糊物体时表达有意义的模型预测不确定性？为了回答这些问题，我们提出了一种可解释的方法来校准概率目标检测器，其校准目标是将类别置信度和边界框方差估计与标注者的标注分布对齐。我们引入了一个高效且有效的框架来校准概率目标检测器，通过设计四个评估指标来衡量分类和定位的校准误差，并提出了一种训练时校准和后处理校准器，所有这些都无需访问任何真实标注。该框架可推广到许多现有的概率目标检测器，例如YOLO系列和两阶段检测器。在医学和自然图像的真实世界和合成数据集上的实验结果表明，所提出的框架与三种流行的目标检测器相结合具有优越的性能。

英文摘要

High degrees of disagreement among annotators can exist for ambiguous objects, e.g. in medical images, underscoring the challenges of establishing ground truth annotations in object detection tasks. Despite this, all existing object detectors implicitly require access to ground truth annotations for either training or evaluation. The fundamental questions we target are: How can we learn an object detector with multiple annotators' annotations but without objective ground truth annotations due to object ambiguity, and how can we enable the learned detector to express meaningful model predictive uncertainties in detecting ambiguous objects? To answer these questions, we present an interpretable approach to calibrate probabilistic object detectors, where the calibration goal is to align the class confidence and bounding box variance estimates to the annotators' annotation distribution. We introduce an efficient yet effective framework to calibrate probabilistic object detectors by designing four evaluation metrics to measure calibration errors regarding classification and localization, and proposing a train-time calibration and post-hoc calibrator, all without the need to access any ground truth. This framework is generalizable to many existing probabilistic object detectors, such as the YOLO families and two-stage detectors. Empirical results with real-world and synthetic datasets of medical and natural images demonstrate the superior performance of the proposed framework with three popular object detectors.

URL PDF HTML ☆

赞 0 踩 0

2605.24721 2026-05-26 cs.CL

ROC Analysis for Evaluating Translation Quality Estimation Systems

ROC分析用于评估翻译质量估计系统

Evelyn Y. Garland, Carola F. Berger

发表机构 * Acta-Transphere ； CFB Scientific Translations LLC

AI总结本文提出使用接收者操作特征（ROC）分析评估自动翻译质量估计（QE）系统，该方法与现有方法结果一致，并能为商业决策提供可操作的性能洞察。

Comments 16 pages, 8 PNG figures, 3 tables, uses acl.sty

2605.24718 2026-05-26 cs.CL

The Tokenizer Tax Across 25 European Languages: Domain Invariance, Cross-Lingual Few-Shot Effects, and the Ukrainian Penalty

25种欧洲语言的Tokenizer税：领域不变性、跨语言少样本效应与乌克兰语惩罚

Volodymyr Ovcharov

发表机构 * LEX AI Platform（LEX AI平台）； legal.org.ua ； Kyiv, Ukraine（基辅，乌克兰）

AI总结研究测量了10个基础模型在25种欧洲语言上的tokenizer生育率，揭示了从英语到其他语言的成本差异，并发现乌克兰语因预训练数据不足而支付额外成本。

Comments 16 pages, 3 figures, 8 tables. Dataset: https://huggingface.co/datasets/overthelex/tokenizer-fertility-map

详情

AI中文摘要

Tokenizer生育率（每词token数）对非英语NLP施加了隐藏成本。我们在平行文本上测量了10个基础模型在25种欧洲语言上的生育率，生成了首个受控的欧洲tokenizer税地图。该税从英语（1.2 tokens/词）到希腊语/马耳他语（约3.1）跨度达2.5倍，遵循清晰层次：罗曼语族（1.5-1.7）、日耳曼语族（1.7-1.9）、斯拉夫语族（2.2-2.5）、乌拉尔语系/波罗的语族（2.7-3.0）。乌克兰语（2.7）比同源斯拉夫语言多支付15-18%，反映了其在预训练数据中的代表性不足。生育率排名在三种文本语域中具有领域不变性（rho > 0.97）。子词分析表明，高生育率tokenizer会碎片化形态边界而非保留它们。对四种斯拉夫语言的跨语言少样本评估显示，少样本效应是模型固有的，而非语言依赖的。我们将所有测量结果作为公共数据集发布。

英文摘要

Tokenizer fertility the number of tokens per word imposes a hidden cost on non-English NLP. We measure fertility for ten foundation models across 25 European languages on parallel text, producing the first controlled tokenizer tax map for the continent. The tax spans 2.5x from English (1.2 tokens/word) to Greek/Maltese (~3.1), following a clear hierarchy: Romance (1.5-1.7), Germanic (1.7-1.9), Slavic (2.2-2.5), Uralic/Baltic (2.7-3.0). Ukrainian (2.7) pays 15-18% more than cognate Slavic languages, reflecting underrepresentation in pre-training data. Fertility rankings are domain-invariant across three text registers (rho > 0.97). A subword analysis reveals that high-fertility tokenizers fragment morphological boundaries rather than preserving them. Cross-lingual few-shot evaluation on four Slavic languages shows that few-shot effects are model-intrinsic, not language-dependent. We release all measurements as a public dataset.

URL PDF HTML ☆

赞 0 踩 0

2605.24712 2026-05-26 cs.LG cs.HC

Hardware-Aware Federated Learning for Speech Emotion Recognition

面向语音情感识别的硬件感知联邦学习

Beyazit Bestami Yuksel, Emrah Dikbiyik

发表机构 * Computer Engineering（计算机工程）； Istanbul Technical University（伊斯坦布尔技术大学）； Department of Computer Technologies（计算机技术系）； Istanbul University-Cerrahpaşa（伊斯坦布尔大学-塞拉赫帕沙）

AI总结提出一种硬件感知联邦学习框架，通过硬件性能分析、Top-K客户端选择和自适应本地轮数，在IEMOCAP数据集上实现情感识别，相比FedAvg减少约36.5%训练时间和40%通信成本。

Comments 4 pages, 3 figures, 4 Tables

2605.24710 2026-05-26 cs.LG math.PR math.ST stat.ML stat.TH

Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

μP 下宽神经网络中的特征学习：平均场极限的可辨识性与稀疏字典分解

Akmal Xodarev

发表机构 * Independent Researcher（独立研究者）

AI总结本文在最大更新参数化（μP）下，针对宽两层神经网络，建立了特征学习的四个结构结果，包括平均场极限的全局存在唯一性、可辨识性刻画、稀疏字典分解以及总特征学习误差分解，并揭示了架构-数据对的自然学习单元。

Comments 86 pages

详情

AI中文摘要

我们在最大更新参数化（$μ$P）下，为宽两层神经网络中的特征学习建立了四个结构结果。第一，我们证明了在$μ$P下带噪声梯度下降的平均场极限的全局存在唯一性，确定了初始化矩序列上的最大可容许权重$w^*$作为参数-矩增长边界的倒数，从而也是流传播的最大加权矩类。有限粒子近似具有关于时间的均匀平方Wasserstein速率$O(N^{-1})$。第二，我们刻画了平均场极限的可辨识性：两个可容许参数测度在$L^2$中诱导相同的网络函数当且仅当它们的活跃分量在模去架构的有限秩实现对称性后一致。轨道深度$D^*_{\mathrm{orb}}$与矩簇深度$D^*_{\mathrm{var}}$不同。第三，在Barron-Hermite目标条件下，长时间极限测度的活跃支撑集允许一个稀疏字典分解：它在模去有限秩实现对称性后至多支撑在$S^*$个原子上，其中$S^*$由一个显式的系数阈值数界定。第四，我们将总特征学习误差分解为统计、优化、混沌传播和稀疏残差分量，其中目标相关的Hermite/Barron尾部取代了任何仅初始化的残差。这四个结果通过一个架构恒等式联系在一起：三元组$(w^*, D^*_{\mathrm{orb}}, S^*)$——最大可容许权重、轨道可辨识深度以及目标可实现时的稀疏字典深度——是架构-数据对$(\sigma, \rho)$的自然学习单元。证明是自包含的，除了来自$μ$P和平均场Langevin理论的标准结果。

英文摘要

We establish four structural results for feature learning in wide two-layer neural networks under the Maximal Update Parametrization ($μ$P). First, we prove global existence and uniqueness of the mean-field limit of noisy gradient descent under $μ$P, identifying the maximal admissible weight $w^*$ on the moment sequence of the initialization as the reciprocal parameter-moment-growth boundary, and hence the largest weighted moment class propagated by the flow. The finite-particle approximation has uniform-in-time squared-Wasserstein rate $O(N^{-1})$. Second, we characterize identifiability of the mean-field limit: two admissible parameter measures induce the same network function in $L^2$ exactly when their active components agree modulo the finite-rank realization symmetry of the architecture. The orbit depth $D^*_{\mathrm{orb}}$ is separated from the moment-variety depth $D^*_{\mathrm{var}}$. Third, under the Barron-Hermite target condition the active support of the long-time limit measure admits a sparse-dictionary decomposition: it is supported on at most $S^*$ atoms modulo finite-rank realization symmetry, with $S^*$ bounded by an explicit coefficient-threshold number. Fourth, we derive the total feature-learning-error decomposition into statistical, optimization, propagation-of-chaos, and sparse-residual components, with a target-dependent Hermite/Barron tail replacing any initialization-only residual. The four results are tied together by an architectural identity: the triple $(w^*, D^*_{\mathrm{orb}}, S^*)$ -- the maximal admissible weight, the orbit identifiability depth, and the sparse-dictionary depth at which the target is realizable -- is the natural learning cell of the architecture-data pair $(σ, ρ)$. The proofs are self-contained except for standard results from $μ$P and mean-field Langevin theory.

URL PDF HTML ☆

赞 0 踩 0

2605.24709 2026-05-26 cs.LG

Streaming Reinforcement Learning under Partial Observability with Real-Time Recurrent Learning

部分可观测下的流式强化学习与实时循环学习

Noah Farr, Aryaman Reddi, Carlo D'Eramo, Jan Peters

发表机构 * Technical University of Darmstadt（德累斯顿技术大学）； University of Würzburg（维尔茨堡大学）； German Research Center for AI (DFKI)（德国人工智能研究中心 (DFKI)）； Zuse School（祖斯学校）

AI总结提出使用递归迹单元（RTU）实现精确实时循环学习（RTRL），在参数数量上具有线性时间和内存复杂度，解决了部分可观测环境下流式强化学习的梯度计算瓶颈，并在离散和连续控制任务中保持性能。

Comments 16 pages, 4 figures

详情

AI中文摘要

流式强化学习已成为一种在线学习范式，它符合自然学习代理的约束，即增量处理数据（批大小为1，无回放缓冲区）。虽然流式RL最近在完全可观测下通过深度函数逼近实现了扩展，但部分可观测设置仍然难以实现。在流式设置下，截断式时间反向传播退化为一步梯度视野，而精确的实时循环学习则代价过高。我们使用递归迹单元（一种对角递归架构，能够在参数数量上实现线性时间和内存复杂度的精确RTRL）来弥合这一差距，并展示它们能够干净地集成到现有的流式算法中，适用于离散和连续控制。在链长从2到128的MemoryChain诊断任务中，我们的方法保持了性能，而使用前馈、GRU和RTU网络的流式TBPTT(1)基线则崩溃。在五个POPGym任务和部分可观测的MuJoCo连续控制中，流式方法在POPGym上与批量PPO竞争，并在掩码MuJoCo上恢复了批量性能的很大一部分，尽管没有使用回放缓冲区或批量更新。

英文摘要

Streaming reinforcement learning has emerged as an online learning paradigm that conforms to the restrictions of natural learning agents that process data incrementally, i.e. with a batch size of 1 and no replay buffer. While streaming RL has recently been shown to scale with deep function approximation with full observability, partially observable settings have remained out of reach. Truncated backpropagation through time collapses to a one-step gradient horizon under the streaming setting, and exact real-time recurrent learning is prohibitively expensive. We close this gap using recurrent trace units, a diagonal recurrent architecture that enables exact RTRL with linear time and memory complexity in the parameter count, and show that they integrate cleanly into existing streaming algorithms across both discrete and continuous control. On a MemoryChain diagnostic with chain lengths from 2 to 128, our method sustains performance where streaming TBPTT(1) baselines using feedforward, GRU, and RTU networks collapse. On five POPGym tasks and on partially observable MuJoCo continuous control, the streaming approach is competitive with batched PPO on POPGym and recovers a substantial fraction of batched performance on masked MuJoCo, despite using no replay buffer or batched updates.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

How Noisy Poses Break Inverse Dynamics: Analysis and Mitigation for Video-Based Joint Torque Estimation

PRIMA: Operational Patterns for Resilient Multi-Agent Research with Verifiable Identity and Convergent Feedback

Hermite-NGP: Gradient-Augmented Hash Encoding for Learning PDEs

Uncertainty Decomposition via Cyclical SG-MCMC and Soft-label Learning for Subjective NLP

From Theory to Decision Rule: Calibrating the Noisy-Label Crossover for Vision-Language Model Weak Supervision Across Three Medical-Imaging Benchmarks

Muon in Vision Transformers: Optimizer-Recipe Interactions and Gradient Spectra

Leveraging pretrained RGB denoisers for hyperspectral image restoration

Enhanced INS/GNSS State Estimation using GNSS-Based Acceleration Measurements

High-fidelity Modeling of Full-scale Pressurized Water Reactor Flow Fields for Machine Learning Applications

4KLSDB: A Large-Scale Dataset for 4K Image Restoration and Generation

Drift-Resistant Navigation World Model with Anchored Epipolar Guidance

Geometric Workspace Analysis and Transmission-Aware Dynamics of a Serial Spherical Tool for Microsurgery

A Contractive Feedback Semantics for Reinforcement Learning

Proper Scoring Rules for Agentic Uncertainty Quantification

Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models

Motion-Compensated Weight Compression

Ghosts in the Point Clouds: De-glaring LiDAR in the Transient Domain

A computational phase transition for learning-to-sample from Ising models

Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning

Aligning Molecular Graph Explanations with Chemical Identity via InChIfied Invariants

Reinforcement Learning for Reachability: Guaranteeing Asymptotic Optimality

Who judges the judges? Governance from metrics: a runtime framework for continuous LLM compliance monitoring

StepGap: A Hybrid NLI-LLM Checker for Step-Level Evidence-Gap Detectionin Multi-Hop Question Answering

From Full Boards to Tiny Defects: Scale-Aware Tile Inference with Topology-Aware Merging for High-Resolution PCB Defect Detection

Calibrating Probabilistic Object Detectors with Annotator Disagreement

ROC Analysis for Evaluating Translation Quality Estimation Systems

The Tokenizer Tax Across 25 European Languages: Domain Invariance, Cross-Lingual Few-Shot Effects, and the Ukrainian Penalty

Hardware-Aware Federated Learning for Speech Emotion Recognition

Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

Streaming Reinforcement Learning under Partial Observability with Real-Time Recurrent Learning