2605.28184 2026-05-28 cs.LG

Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration

通过最优系数校准在强化学习中联合训练多令牌预测

Zili Wang, Jiajun Chai, Lin Chen, Xiaohan Wang, Shiming Xiang, Guojun Yin

发表机构 * School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）； MAIS, Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所MAIS）； Meituan（美团）

AI总结本文从优化角度分析多令牌预测与强化学习联合训练失败的原因，提出最优系数校准方法，通过在线跟踪最优系数实现性能提升。

详情

AI中文摘要

基于可验证奖励的强化学习已成为提升大语言模型推理能力的标准范式，而多令牌预测是预训练中广泛采用的模块。将两者结合是自然的方法，但当前的强化学习实践会分离多令牌梯度，因为联合训练会降低性能。我们从优化角度重新审视这一失败。我们表明，多令牌对强化学习目标的每步影响可分解为两项：一阶相关性和二阶扰动惩罚。这种分解统一了三种多令牌训练模式：分离、交叉熵损失和策略损失，并解释了每种模式成功或失败的原因。对策略损失的进一步分析揭示，尽管它符合直觉，但性能仍然下降：相关性项衰减而二次惩罚持续存在。在分析指导下，我们提出最优系数校准，一种自适应方案，通过对数概率代理在线跟踪最优系数，且成本可忽略。在六个竞赛级数学推理基准上，最优系数校准一致达到或超过分离基线，实现了改进的联合多令牌-强化学习训练性能。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as the standard paradigm for improving reasoning capability of large language models, while Multi-Token Prediction (MTP) has been a widely adopted module in pretraining. Combining them is a natural approach, yet current RL practices detach MTP gradients because joint training degrades the performance. We revisit this failure from an optimization perspective. We show that the per-step effect of MTP on the RL objective can be decomposed into two terms: a first-order correlation and a second-order perturbation penalty. This decomposition unifies three MTP training regimes: Detach, Cross-Entropy loss, and Policy loss, and explains why each succeeds or fails. Further analysis of policy loss reveals that, although it aligns with intuition, performance still degrades: the correlation term decays while the quadratic penalty persists. Guided by the analysis, we propose Optimal Coefficient Calibration (OCC), an adaptive scheme that tracks the optimal coefficient online via a log-probability proxy at negligible cost. Across six competition-level mathematical reasoning benchmarks, OCC consistently matches or exceeds the detach baseline, delivering improved joint MTP-RL training performance.

URL PDF HTML ☆

赞 0 踩 0

2605.28181 2026-05-28 cs.CL

When Confidence Misleads: Suffix Anchoring and Anchor-Proximity Confidence Modulation for Diffusion Language Models

当置信度误导时：扩散语言模型的后缀锚定与锚邻近置信度调制

Jungwon Park, Jimyeong Kim, Jungmin Ko, Nojun Kwak, Wonjong Rhee

发表机构 * RICS（智能研究学院）； AIIS（人工智能研究所）； IPAI（人工智能研究所）； Department of Intelligence and Information（智能与信息系）； Daegu Gyeongbuk Institute of Science and Technology（大邱庆州市科学技术院）

AI总结针对扩散语言模型中置信度误导导致生成不完整或过早解码的问题，提出后缀锚定与锚邻近置信度调制方法，无需训练即可提升完全非自回归解码性能。

Comments Preprint

详情

AI中文摘要

扩散语言模型通过对掩码标记序列进行迭代去噪来解码文本，使得选择解码位置成为推理时的核心决策。大多数无训练解码策略使用模型置信度进行位置选择，假设高置信度位置已准备好解码。本文通过研究置信度何时误导完全非自回归解码来重新审视这一假设。EOT标记可能获得高置信度并导致生成不完整；插入后缀锚定可以缓解此问题，但会在锚附近引入局部过度置信，导致锚邻近标记过早解码。为解决这些问题，我们提出后缀锚定置信度调制，一种简单的无训练方法，它插入短后缀锚定以鼓励回复完成，并根据解码进度调制锚附近的置信度。这保留了后缀锚定的回复完成优势，同时减少了锚邻近标记的过早解码。在纯文本推理、视觉-语言推理和代码生成基准测试中，我们的方法持续改进基于置信度的完全非自回归解码，优于显式EOT抑制，并保持了完全非自回归生成的并行解码优势。

英文摘要

Diffusion language models decode text by iteratively denoising masked token sequences, making the choice of which positions to decode a central inference-time decision. Most training-free decoding strategies use model confidence for position selection, assuming that high-confidence positions are ready to be decoded. In this work, we revisit this assumption by studying when confidence misleads fully non-autoregressive (fully non-AR) decoding. EOT tokens can receive high confidence and cause incomplete generation; inserting a suffix anchor can mitigate this issue but introduces local overconfidence near the anchor, causing anchor-adjacent tokens to be decoded too early. To address these issues, we propose Suffix-Anchored Confidence Modulation, a simple training-free method that inserts a short suffix anchor to encourage response completion and modulates confidence near the anchor according to decoding progress. This preserves the response-completion benefit of suffix anchoring while reducing premature decoding of anchor-adjacent tokens. Across text-only reasoning, vision-language reasoning, and code-generation benchmarks, our method consistently improves confidence-based fully non-AR decoding, outperforms explicit EOT suppression, and preserves the parallel decoding advantage of fully non-AR generation.

URL PDF HTML ☆

赞 0 踩 0

2605.28179 2026-05-28 cs.CL

SuperValid: Capability-Aligned OOD Validation for Generalizable Downstream Scaling

SuperValid: 面向泛化下游扩展的能力对齐OOD验证

Quanen Sun, Changxin Tian, Ke Shi, Cai Chen, Cunyin Peng, Jia Liu, Kunlong Chen, Zhiqiang Zhang

发表机构 * Ant Group（蚂蚁集团）

AI总结提出SuperValid框架，通过从基准测试中提炼核心概念并扩展为多样化的知识丰富文本，合成能力对齐的分布外验证数据，以在能力层面预测下游性能，实现有效的模型选择、早停和扩展决策。

详情

AI中文摘要

扩展定律通过将计算量与交叉熵损失相关联来指导大型语言模型的训练，最近的工作进一步将其扩展到预测下游基准性能。然而，先前的方法在两个方面面临泛化限制：关注基准级性能会引入特定场景的伪影，而依赖IID验证损失则无法在训练分布变化时跟踪能力提升。在这项工作中，我们认为下游扩展应在能力层面进行研究，这能够捕捉跨相关任务的共享技能因素，同时抽象掉基准特定的噪声。我们提出了SuperValid，一个通过从能力领域内的基准测试中提炼核心概念并将其扩展为多样化的知识丰富文本来合成OOD（分布外）、能力对齐验证数据的框架。涵盖6个能力领域内17个基准测试的大量实验表明，SuperValid损失与不同架构、规模和训练数据分布的模型的下游性能表现出强且稳定的相关性。作为一种无需训练、可在训练期间计算且无需基准评估的度量，SuperValid实现了有效的模型选择、早停和扩展决策。

英文摘要

Scaling laws guide large language model training by relating compute to cross-entropy loss, and recent work further extends them to predict downstream benchmark performance. However, prior approaches face generalization limitations from two aspects: focusing on benchmark-level performance introduces scenario-specific artifacts, while relying on IID validation loss fails to track capability improvements when training distributions vary. In this work, we argue that downstream scaling should be studied at the capability level, which captures shared skill factors across related tasks while abstracting away benchmark-specific noise. We propose SuperValid, a framework that synthesizes OOD (out-of-distribution), capability-aligned validation data by distilling core concepts from benchmarks within a capability domain and expanding them into diverse, knowledge-rich texts. Extensive experiments spanning 17 benchmarks grouped into 6 capability domains show that SuperValid loss exhibits strong and stable correlation with downstream performance across models of different architectures, scales, and training data distributions. As a training-free metric computable during training without benchmark evaluation, SuperValid enables effective model selection, early stopping, and scaling decisions.

URL PDF HTML ☆

赞 0 踩 0

2605.28176 2026-05-28 cs.CV

From Kellgren-Lawrence to Calcium Pyrophosphate Crystal Deposition: A Soft-Labelling Framework for Knee Osteoarthritis Assessmen

从Kellgren-Lawrence到焦磷酸钙晶体沉积：一种用于膝骨关节炎评估的软标签框架

Francisco Bérchez-Moreno, Riccardo Rosati, Maria Chiara Fiorentino, Víctor M. Vargas, Edoardo Cipolletta, Emilio Filippucci, Luca Romeo, Pedro A. Gutiérrez, César Hervás-Martínez

发表机构 * organization= Department of Political Science, Communication ； International Relations, University of Macerata , city= Macerata , country= Italy ； organization= Department of Economics ； Law, University of Macerata , city= Macerata , country= Italy ； organization= Department of Innovative Technologies in Medicine \& Dentistry, Università degli Studi "G. D'Annunzio" Chieti - Pescara , city= Chieti , country= Italy ； organization= Department of Internal Medicine, Azienda Ospedaliero Universitaria delle Marche , city = Ancona , country= Italy ； organization= Academic Rheumatology, University of Nottingham , city = Nottingham , country= UK ； organization= Department of Rheumatology, Polytechnic University of Marche , city= Ancona , country= Italy

AI总结提出基于软标签的序贯深度学习框架，通过单峰概率分布替代独热编码，同时处理KL和CPPD分级中的序数不确定性和不对称关系，在膝X光图像上显著提升分级性能。

详情

AI中文摘要

背景与目标。传统的膝骨关节炎（KOA）分级深度学习方法依赖于独热标签，未能捕捉Kellgren-Lawrence（KL）和焦磷酸钙沉积病（CPPD）严重程度评分的序数不确定性，以及临床实践中观察到的两个量表之间的不对称关系。方法。我们回顾性收集了2172张膝关节X光图像，包括968张同时标注了KL和CPPD严重程度的X光片。开发了一个基于软标签的序贯深度学习框架用于两项任务，用以标注等级为中心的单峰概率分布替代独热目标。研究了四种分布形式：二项分布、贝塔分布、三角分布和指数分布。结果。所有软标签策略均持续优于名义基线。对于CPPD分级，三角分布实现了最高的二次加权卡帕（QWK）和最低的平均绝对误差（MAE）（QWK = 0.796；MAE = 0.438），而贝塔分布在考虑各类别的平均MAE（AMAE）和最大MAE（MMAE）时产生了最平衡的类别性能（AMAE = 0.458；MMAE = 0.573）。对于KL分级，基于贝塔的方法提供了最佳整体性能，实现了最高的QWK以及最低的MAE和类别误差（QWK = 0.777；MAE = 0.529；AMAE = 0.523；MMAE = 0.775）。统计分析表明，与传统的独热监督相比有显著改进（p < 0.001）。

英文摘要

Background and objective. Conventional Deep Learning (DL) approaches for Knee Osteoarthritis (KOA) grading rely on one-hot labels, which fail to capture both the ordinal uncertainty of Kellgren--Lawrence (KL) and Calcium Pyrophosphate Deposition Disease (CPPD) severity scores and the asymmetric relationship between the two scales observed in clinical practice. Methods. We retrospectively collected 2172 knee X-ray images, including 968 radiographs jointly annotated for KL and CPPD severity. An ordinal DL framework based on soft-labelling was developed for both tasks, replacing one-hot targets with unimodal probability distributions centred on the annotated grade. Four formulations were investigated: binomial, beta, triangular, and exponential. Results. All soft-labelling strategies consistently outperformed the nominal baseline. For CPPD grading, the triangular formulation achieved the highest Quadratic Weighted Kappa (QWK) and the lowest Mean Absolute Error (MAE) (QWK = 0.796; MAE = 0.438), while the beta formulation yielded the most balanced class-wise performance considering Average MAE (AMAE) and Maximum MAE (MMAE) across classes (AMAE = 0.458; MMAE = 0.573). For KL grading, the beta-based approach provided the best overall performance, achieving the highest QWK together with the lowest MAE and class-wise errors (QWK = 0.777; MAE = 0.529; AMAE = 0.523; MMAE = 0.775). Statistical analysis demonstrated significant improvements over conventional one-hot supervision (p < 0.001).

URL PDF HTML ☆

赞 0 踩 0

2605.28174 2026-05-28 cs.CV cs.AI

FLORO: A Multimodal Geospatial Foundation Model for Ecological Remote Sensing Across Sensors and Scales

FLORO：面向跨传感器与尺度的生态遥感多模态地理空间基础模型

Jorge L. Rodriguez, Victor Angulo Morales, Areej Alwahas, Mariana Elias Lara, Fida Mohammad Thoker, Kasper Johansen, Bernard Ghanem, Fernando T. Maestre, Matthew F. McCabe

发表机构 * Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology（国王阿卜杜勒·阿齐兹科技大学生物与环境科学与工程 division）； Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology（国王阿卜杜勒·阿齐兹科技大学计算机、电气与数学科学与工程 division）

AI总结提出FLORO多模态地理空间基础模型，通过掩码自编码在异构遥感数据上预训练，利用可用性感知输入统一异构传感器配置，在PANGAEA基准上实现强迁移性能。

Comments 29 pages, 9 figures

详情

AI中文摘要

基础模型为可迁移的遥感表示提供了有前景的途径，但许多当前方法依赖于非常大的预训练数据集和固定的传感器配置，限制了它们在生态和环境应用中的适用性，这些应用中的观测通常跨平台、空间和光谱分辨率以及可用模态而变化。我们提出了FLORO，一个多模态地理空间基础模型，旨在从一个小型但高度多样化的遥感语料库中学习可迁移表示。FLORO使用掩码自编码在Sentinel-1、Sentinel-2、SkySAT影像、高程和无人机数据的异构组合上进行预训练。为了适应传感器变异性，FLORO结合了可用性感知输入，指示每个样本中存在哪些光谱波段和辅助模态，从而在异构传感器配置上实现统一的输入空间。我们在PANGAEA基准上，在冻结编码器协议下，评估了FLORO的场景分类、分割和回归任务。尽管在比竞争基础模型更小的语料库上预训练，FLORO在跨光学、光学-SAR和光学-高程基准（涵盖中分辨率卫星、航空和超高分辨率无人机影像）上实现了强大且稳定的迁移。FLORO在六个PANGAEA基准上取得了第二好的平均分割性能，仅次于最近引入的预训练图像数量超过两个数量级的基础模型，在场景分类上保持竞争力，在回归任务中表现稳健，而定性结果显示在洪水、城市、生物量和冠层高度预测设置中空间结构的保存有所改善。在EuroSAT-MS上的单独对照实验中，相对于绝对位置编码，地理位置编码进一步提高了分类性能。

英文摘要

Foundation models offer a promising route to transferable remote sensing representations, but many current approaches depend on very large pretraining datasets and fixed sensor configurations, limiting their suitability for ecological and environmental applications, where observations often vary across platforms, spatial and spectral resolutions, and available modalities. We introduce FLORO, a multimodal geospatial foundation model designed to learn transferable representations from a small but highly diverse remote sensing corpus. FLORO is pretrained using masked autoencoding on a heterogeneous combination of Sentinel-1, Sentinel-2, SkySAT imagery, elevation, and UAV-derived data. To accommodate sensor variability, FLORO incorporates availability-aware inputs that indicate which spectral bands and auxiliary modalities are present in each sample, enabling a unified input space across heterogeneous sensor configurations. We evaluated FLORO on the PANGAEA benchmark under a frozen-encoder protocol across scene classification, segmentation, and regression tasks. Despite being pretrained on a smaller corpus than competing foundation models, FLORO achieved strong and stable transfer across optical, optical-SAR, and optical-elevation benchmarks spanning medium-resolution satellite, airborne, and ultra-high-resolution UAV imagery. FLORO obtained the second-best average segmentation performance across six PANGAEA benchmarks, trailing only a recently introduced foundation model pretrained on over two orders of magnitude more images, remained competitive on scene classification, and was robust in regression tasks, while qualitative results showed improved preservation of spatial structure in flood, urban, biomass, and canopy-height prediction settings. In a separate controlled experiment on EuroSAT-MS, geo-positional encoding further improved classification relative to absolute positional encoding.

URL PDF HTML ☆

赞 0 踩 0

2605.28173 2026-05-28 cs.CV

鲁棒监督学习的统一与优化

Jonas Hanselle, Valentin Margraf, Clemens Damke, Eyke Hüllermeier

发表机构 * LMU Munich, MCML（慕尼黑大学，MCML）

AI总结提出一个统一框架，将分布鲁棒优化、标签平滑、邻域风险最小化和Mixup等鲁棒学习方法组织为三个设计轴，并通过联合超参数优化自动组合适合任务的鲁棒策略。

详情

AI中文摘要

文献中提出了各种经验风险最小化的鲁棒替代方案，以应对分布偏移、标签噪声和有限样本退化等故障模式。例如分布鲁棒优化、标签平滑、邻域风险最小化和Mixup。然而，这些方法通常是孤立开发的，迫使从业者事先承诺单一故障模式，即使任务的主要模式尚不清楚。为了解决这个问题，我们将现有的一大类方法沿着三个共同的设计轴组织起来，并推导出一个可行的训练程序，将鲁棒学习分解为顺序阶段（参考分布丰富化、输入空间扰动、标签空间扰动和样本级聚合），每个阶段都有立场选择（悲观、中性或乐观）。这产生了一个统一的设计空间，其中联合超参数优化可以组合和配置适合手头任务的鲁棒策略。在表格、图像和奖励建模基准测试中，联合超参数优化与每种设置中最佳单方法基线具有竞争力，为那些事先不知道其任务中哪种故障模式占主导地位的从业者提供了可靠的默认选择。

英文摘要

The literature has proposed various robust alternatives to empirical risk minimisation to address failure modes such as distribution shift, label noise and finite-sample degeneracies. Examples include distributionally robust optimization, label smoothing, vicinal risk minimization, and Mixup. However, such approaches are typically developed in isolation, forcing practitioners to commit a priori to a single failure mode even when the dominant mode for the task is unclear. To address this, we organize a broad class of existing methods along three common design axes and derive a tractable training procedure that decomposes robust learning into sequential stages (reference distribution enrichment, input-space perturbation, label-space perturbation, and sample-level aggregation), each with a choice of stance (pessimistic, neutral, or optimistic). This results in a unified design space in which joint hyperparameter optimization can compose and configure robustness strategies suited to the task at hand. Across tabular, image, and reward modeling benchmarks, joint hyperparameter optimization is competitive with the best single-method baseline in each setting, offering a reliable default for practitioners who do not know a priori which failure mode dominates their task.

URL PDF HTML ☆

赞 0 踩 0

2605.28163 2026-05-28 cs.CL cs.AI

前馈3D编辑从语义部分变换中学习

Jiawei Weng, Saining Zhang, Zhenxin Diao, Peishuo Li, Henghaofan Zhang, Junhao Chen, Hao Zhao

发表机构 * Nanyang Technological University（南洋理工大学）； Tsinghua University（清华大学）

AI总结提出Pxform数据集和PartFlow网络，通过语义部分变换实现高质量前馈3D编辑，在几何和外观编辑基准上达到最优性能。

Comments 31 pages, 22 figures. Project Page: https://dennis-jwweng.github.io/pxform/

详情

AI中文摘要

3D编辑是可扩展3D内容创作的基本能力。虽然图像编辑已迅速向大规模前馈生成范式发展，但3D AI生成仍以无需训练的编辑流程为主。前馈3D编辑的核心挑战在于缺乏高质量配对监督。可编辑的3D资产需要同时保持几何、多视图一致性、结构连贯性和局部编辑可控性。现有的3D编辑数据集通常依赖于独立生成的资产、图像介导的重建或狭窄的编辑分类，导致定位不准确、保持性弱、编辑边界模糊和语义一致性有限。在这项工作中，我们引入了一个新视角：可扩展的前馈3D编辑应从语义部分变换中学习。基于这一见解，我们提出了Pxform，一个高质量的3D编辑数据集，包含超过10万对七种编辑类型的一致前后编辑对。我们的流程不是将对象视为无结构形状，而是直接将编辑锚定在语义3D部分。基于Pxform，我们进一步提出了PartFlow，一个前馈3D编辑网络，它将源感知潜在控制注入预训练的3D生成先验中。PartFlow引入了掩码感知速度保持和渲染空间一致性监督，以共同提高编辑保真度和源保持，同时在推理时不需要3D编辑掩码。大量实验表明，高质量的语义部分监督显著改进了可扩展的3D编辑，使PartFlow在几何和外观编辑基准上均达到了最先进的性能。

英文摘要

3D editing is a fundamental capability for scalable 3D content creation. While image editing has rapidly evolved toward large-scale feedforward generative paradigms, 3D AI generation remains dominated by training-free editing pipelines. A central challenge of feedforward 3D editing lies in the lack of high-quality paired supervision. Editable 3D assets require simultaneous preservation of geometry, multi-view consistency, structural coherence, and localized edit controllability. Existing 3D editing datasets often rely on independently generated assets, image-mediated reconstruction or narrow edit taxonomies, leading to inaccurate localization, weak preservation, blurred edit boundaries, and limited semantic consistency. In this work, we introduce a new perspective: scalable feedforward 3D editing should be learned from semantic-part transformations. Based on this insight, we propose Pxform, a high-quality 3D editing dataset with over 100K consistent before/after editing pairs across seven edit types. Instead of treating objects as unstructured shapes, our pipeline grounds edits directly in semantic 3D parts. Built upon Pxform, we further propose PartFlow, a feedforward 3D editing network that injects source-aware latent control into pretrained 3D generative priors. PartFlow introduces mask-aware velocity preservation and render-space consistency supervision to jointly improve edit fidelity and source preservation, while requiring no 3D edit mask during inference. Extensive experiments demonstrate that high-quality semantic-part supervision substantially improves scalable 3D editing, enabling PartFlow to achieve state-of-the-art performance on both geometric and appearance editing benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.27102 2026-05-28 cs.CV cs.LG

JLT: Clean-Latent Prediction in Latent Diffusion Transformers

JLT: 潜在扩散Transformer中的干净潜在预测

Funing Fu, Tenghui Wang, Guanyu Zhou, Junyong Cen, Qichao Zhu

发表机构 * Independent Researcher（独立研究者）； Wuhan University of Technology（武汉理工大学）； Hangzhou Jiyi Artificial Intelligence Co., Ltd.（杭州智益人工智能有限公司）

AI总结本文提出JLT，一种在冻结的FLUX.2 VAE编码上训练的130M潜在扩散Transformer，通过干净潜在预测相比速度预测在ImageNet 256×256上获得更优的FID分数，表明潜在扩散中的预测目标是依赖于表示的几何选择。

详情

AI中文摘要

使用干净数据预测的流匹配表明，回归干净点比预测环境噪声量更能有效利用低维结构。我们询问在图像被映射到学习到的潜在空间后，这一原则是否仍然有用，因为压缩已经去除了原始像素的大部分变异性。我们引入了JLT，一个在冻结的FLUX.2 VAE编码上的130M潜在扩散Transformer，并在相同的表示、主干和训练设置下，将干净潜在预测与匹配的速度预测DiT进行比较。尽管三个变量x、epsilon和v在固定损坏时间下是线性可转换的，但局部高斯分析表明，速度回归继承了各向同性的目标协方差下限，并放大了低方差潜在方向，而干净预测则抑制了它们。在ImageNet 256×256上，JLT-B/1在无分类器引导下获得了FID-50K 2.50，与速度预测相比有较大的匹配目标差距。这些结果表明，潜在扩散中的预测目标是依赖于表示的几何选择，而不是可互换的代数参数化。

英文摘要

Flow matching with clean-data prediction has shown that regressing the clean point can exploit low-dimensional structure more effectively than predicting an ambient noised quantity. We ask whether this principle remains useful after images are mapped into a learned latent space, where compression has already removed much of the raw pixel variability. We introduce JLT, a 130M latent diffusion Transformer over frozen FLUX.2 VAE codes, and compare clean-latent prediction with a matched velocity-prediction DiT under the same representation, backbone, and training settings. Although the three variables x, epsilon, and v are linearly convertible for a fixed corruption time, a local Gaussian analysis shows that velocity regression inherits an isotropic target-covariance floor and amplifies low-variance latent directions, while clean prediction damps them. On ImageNet 256 x 256, JLT-B/1 obtains FID-50K 2.50 with classifier-free guidance, with a large matched-target gap over velocity prediction. These results suggest that prediction targets in latent diffusion are representation-dependent geometric choices, rather than interchangeable algebraic parameterizations.

URL PDF HTML ☆

赞 0 踩 0

2605.26910 2026-05-28 cs.LG cs.AI q-bio.NC

EEG-FM-Audit: A Systematic Evaluation and Analysis Pipeline for EEG Foundation Models

EEG-FM-Audit：脑电图基础模型的系统评估与分析流程

Xianheng Wang, Yige Yang, Damien Coyle

发表机构 * Bath Institute for the Augmented Human（巴思增强人类研究所）； University of Bath（巴斯大学）

AI总结提出EEG-FM-Audit流程，通过ASHA驱动的基准测试、范式级消融研究和神经生理探测，系统评估脑电图基础模型，发现调优的监督基线可媲美或超越先进基础模型。

Comments 26 pages

详情

AI中文摘要

大型脑电图基础模型在解码跨多种认知任务的脑电图信号方面展现出巨大潜力。然而，现有的EEG-FM研究存在三个关键局限性：不透明的监督基线调优、复杂学习范式的贡献未经验证以及模型决策缺乏透明度。为解决这些问题，我们提出了EEG-FM-Audit，一个旨在系统化评估EEG-FM的综合评估与分析流程。EEG-FM-Audit包含三个主要组成部分：(1) ASHA驱动的基准测试协议，通过透明优化监督基线确保公平比较；(2) 范式级消融研究，评估FM中学习范式的有效性；(3) 神经生理探测框架，探究FM是否利用了有效的时域、空域和频域脑电图特性。我们将EEG-FM-Audit应用于四个最先进的EEG-FM和五个代表性监督模型，涉及三个公开数据集。结果表明，尽管参数显著减少，但适当调优的监督基线可以匹配或超越先进的FM。此外，我们发现FM学习范式的有效性高度依赖于数据集规模和架构。最后，NPP分析展示了FM如何依赖特定的生理特征，为更可解释的神经解码建立了框架。

英文摘要

Large EEG Foundation Models (FMs) have shown great potential for decoding EEG signals across diverse cognitive tasks. However, existing EEG-FM studies exhibit three critical limitations: opaque supervised baseline tuning, unverified contributions of complex learning paradigms, and a lack of transparency in model decision-making. To address these, we propose EEG-FM-Audit, a comprehensive evaluation and analysis pipeline designed to systematize the assessment of EEG-FMs. EEG-FM-Audit consists of three primary components: (1) an ASHA-driven benchmarking protocol that ensures fair comparisons by transparently optimizing supervised baselines; (2) paradigm-level ablation studies to evaluate the effectiveness of learning paradigms in FMs; and (3) a neurophysiological probing (NPP) framework, which explores whether FMs leverage valid temporal, spatial, and spectral EEG properties. We apply EEG-FM-Audit to four state-of-the-art EEG-FMs and five representative supervised models across three public datasets. Our results reveal that properly tuned supervised baselines can match or outperform advanced FMs, despite requiring significantly fewer parameters. Furthermore, we find that the effectiveness of learning paradigms of FMs is highly dependent on dataset scale and architecture. Finally, NPP analysis demonstrates how FMs rely on specific physiological features, establishing a framework for more interpretable neural decoding.

URL PDF HTML ☆

赞 0 踩 0

2605.26368 2026-05-28 cs.CV cs.AI

冗余机器人系统的隐式零空间流形生成

Taiki Ishigaki, Teresa Vidal-Calleja, Ko Ayusawa, Eiichi Yoshida

发表机构 * Tokyo University of Science, Japan（日本东京科学大学）； University of Technology Sydney, Australia（澳大利亚悉尼技术大学）； National Institute of Advanced Industrial Science and Technology, Japan（日本国家先进工业科学与技术研究院）

AI总结针对冗余机器人系统，提出一种基于雅可比引导探索的隐式标量场方法，通过零水平集表示解流形，实现解空间几何结构的有效估计与连续任务建模。

Comments Corrected author names in references

详情

AI中文摘要

具有冗余自由度的机器人系统可以通过多种配置实现相同的任务结果，从而形成配置空间中的解流形。现有方法通常通过基于雅可比的技术局部利用这种冗余性来计算单个解或轨迹。虽然这些方法在求解计算上有效，但它们不保留解集本身的几何结构表示。在这项工作中，我们采用以表示为中心的方法来估计解空间的几何结构。我们考虑由通用任务定义映射诱导的解流形，并在配置空间上构建一个隐式标量场，其零水平集对应于解流形。为此，我们使用雅可比引导的探索策略在解流形附近生成样本，该策略有效捕获其局部和全局结构。得到的隐式表示定义在配置空间上，并自然诱导出一个连续的距离场，编码到解流形的接近度。在平面三连杆机器人和七自由度Franka机械臂上的实验证明了所提出表示的有效性。此外，该框架能够对具有连续变化的任务族进行解空间的一致建模。

英文摘要

Robotic systems with redundant degrees of freedom can achieve the same task outcome using multiple configurations, resulting in solution sets that form manifolds in the configuration space. Existing approaches typically exploit such redundancy locally through Jacobian-based techniques to compute individual solutions or trajectories. While effective for solution computation, these methods do not retain a representation of the geometry of the solution set itself. In this work, we adopt a representation-centric approach to estimate the geometric structure of the solution space. We consider solution manifolds induced by general task-defining maps and construct an implicit scalar field over the configuration space, whose zero-level set corresponds to the solution manifold. To this end, we generate samples in the neighborhood of the solution manifold using a Jacobian-guided exploration strategy, which efficiently captures its local and global structure. The resulting implicit representation is defined over the configuration space and naturally induces a continuous, distance field that encodes proximity to the solution manifold. Experiments on a planar three-link robot and a seven-degree-of-freedom Franka manipulator demonstrate the effectiveness of the proposed representation. Furthermore, the framework enables consistent modeling of solution spaces across families of tasks with continuous variation.

URL PDF HTML ☆

赞 0 踩 0