arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3860
2605.18567 2026-05-19 cs.CL cs.LG

GUT-IS: A Data-Driven Approach to Integrating Constructs and Their Relations in Information Systems

GUT-IS: 一种数据驱动的方法,用于整合信息系统的构念及其关系

Maximilian Reinhardt, Jonas Scharfenberger, Burkhardt Funk

AI总结 本文提出了一种数据驱动的方法,通过结合任务适应的文本嵌入和聚类技术,生成构念分组候选集,并利用显式权衡语义纯度和聚类数量简洁性的损失函数选择最优解,从而分析构念分组及其关系在优先级从纯度转向简洁性时的变化。

详情
Comments
Accepted at the 34th European Conference on Information Systems (ECIS 2026), Milan, Italy
AI中文摘要

结构方程建模在信息系统研究中被广泛应用。然而,不一致的构念定义阻碍了知识的累积发展。在本工作中,我们提出了一种旨在将结构方程模型整合到统一模型中的方法:我们使用任务适应的文本嵌入和聚类技术生成构念分组的候选集。随后,我们利用一个损失函数来显式权衡语义纯度和聚类数量的简洁性,通过显式权衡,我们的方法允许分析构念分组及其关系如何在优先级从纯度转向简洁性时发生变化。实证上,我们对两个来自信息系统领域的数据集进行了评估和探索。

英文摘要

Structural equation modeling is widely used in IS research. However, inconsistent construct definitions impede the cumulative development of knowledge. In this work, we present an approach that aims at the integration of structural equation models into a unified model: We use a combination of task-adapted text embeddings and clustering to produce a candidate set of construct groupings. Subsequently, we select the optimal solution using a loss function that explicitly trades off semantic purity and parsimony in the number of clusters. By making this trade-off explicit, our approach allows to analyze how construct groupings and their relations change as one shifts the priority from purity to parsimony. Empirically, we evaluate and explore the proposed methodology on two datasets from the IS domain.

2605.18563 2026-05-19 cs.CL

Readers make targeted regressions to plausible errors in reanalysis of "noisy-channel garden-path" sentences

读者对‘噪声通道花园路径’句子中的合理错误进行定向回归

Thomas Hikaru Clark, Roger Levy, Edward Gibson

AI总结 研究探讨了读者在处理‘噪声通道花园路径’句子时如何通过后续信息定位可能的错误,揭示了阅读动态中的定向回归现象及对噪声通道语言理解理论的影响。

详情
AI中文摘要

心理学语言学中的一个关键问题是,如何在理解语言输入时,读者的推理过程是逐步展开的。在本研究中,我们研究了“噪声通道花园路径”句子的阅读动态,这些句子暂时看起来是合理的,但后期会出现违反预期的错误,这些错误可以通过推断错误的存在来解决,而不是通过推断替代的句法结构。我们发现有证据表明存在定向回归——即读者在后续信息到来时,会将目光转向可能的错误位置,显示出与噪声通道处理再分析模型后验推理一致的模式。我们讨论了这些发现对噪声通道语言理解理论和信息论解释阅读动态的影响。

英文摘要

A key question in psycholinguistics is how inferences about the meaning of linguistic input unfold incrementally a comprehender's mind. In this work, we study reading dynamics for ``noisy-channel garden-path'' sentences, which temporarily appear well-formed but feature late-appearing violations of expectation that can be resolved not by inferring an alternative syntactic structure, but by inferring the presence of an error. We find evidence for targeted regressions -- eye movements towards regions that are promising loci of possible errors in light of later-arriving information, showing patterns consistent with the posterior inferences of a model of noisy-channel processing with reanalysis. We discuss the implications of these findings for theories of noisy-channel language comprehension and information-theoretic explanations of reading dynamics.

2605.18557 2026-05-19 cs.LG cs.NE q-bio.NC

Self-supervised local learning rules learn the hidden hierarchical structure of high-dimensional data

自监督局部学习规则学习高维数据的隐藏层次结构

Ariane Delrocq, Wu S. Zihan, Guillaume Bellec, Wulfram Gerstner

AI总结 本文研究了自监督局部学习规则在随机层次模型上的表现,发现第一类规则因输入特定的非线性(masking)失效,而第二类规则能有效学习层次结构并具备数据效率和生物合理性。

详情
AI中文摘要

大脑学习高维感觉输入的抽象表示,但使这种学习成为可能的可塑性规则尚不明确。我们研究了生物合理的算法在随机层次模型(RHM)上的表现,RHM是一个人工数据集,用于研究深度神经网络如何学习高维数据的内在层次结构。我们专注于两种类型的局部学习规则,它们避免了长收敛时间和对称误差网络的使用。第一类使用直接反馈信号来近似从输出层的误差传播。第二类使用分层自监督对比或非对比损失函数,不显式近似输出层的误差。我们证明所有第一类规则都无法解决RHM的任务,并追溯这种失败到输入特定的非线性(masking),这些非线性在完全反向传播中被实现,并对学习复杂任务至关重要。然而,第二类算法能够学习RHM任务的层次隐藏结构,并且与监督反向传播训练一样高效,同时与已知的皮层突触可塑性规则兼容。

英文摘要

The brain learns abstract representations of high-dimensional sensory input, but the plasticity rules that enable such learning are unknown. We study biologically plausible algorithms on the Random Hierarchy Model (RHM), an artificial dataset designed to investigate how deep neural networks learn the intrinsic hierarchical structure of high-dimensional data. We focus on two types of local learning rules that avoid both a long convergence time and the use of a symmetric error network. The first type uses direct feedback signals to approximate error propagation from the output layer. The second type uses layerwise self-supervised contrastive or non-contrastive loss functions that do not explicitly approximate errors at the output layer. We show that all rules of the first type fail to solve the tasks of the RHM and trace this failure back to input-specific nonlinearities (`masking') that are implemented in full backpropagation and are essential for learning complex tasks. However, algorithms of the second type are able to learn the hierarchical hidden structure of the RHM tasks and are as data-efficient as supervised backpropagation training, while being compatible with known rules of synaptic plasticity in cortex.

2605.18556 2026-05-19 cs.RO cs.AI

Key-Gram: Extensible World Knowledge for Embodied Manipulation

Key-Gram: 用于具身操作的可扩展世界知识

Jingjing Fan, Siyuan Li, Botao Ren, Zhidong Deng

AI总结 本文提出Key-Gram框架,通过分离语言知识与视觉状态推理,提升具身控制中对组合语言指令的理解和执行能力,主要贡献是引入可扩展的外部记忆模块以提高迁移和现实世界操作性能。

详情
Comments
16 pages, 5 figures
AI中文摘要

具身控制越来越多地要求模型在动态视觉状态上进行推理的同时遵循组合语言指令。然而,当前的视觉-语言-动作策略和世界-动作模型通常将语言知识与视觉计算结合在共享的骨干或条件路径中,导致模态竞争,并使知识扩展依赖于骨干更新。在本文中,我们引入了Key-Gram,一种条件记忆框架,它将语言衍生的世界知识与视觉状态推理分离用于具身控制。其核心是一个记忆模块,该模块将指令分解为任务特定的关键词组,通过确定性哈希查找检索静态语言先验,并通过上下文感知门控和轻量级卷积融合将检索到的条目注入到选定的隐藏层中。这种设计使骨干能够将其主要能力用于视觉推理和动作推断,同时可重用的指令知识存储在可扩展的外部记忆中。逻辑记忆表可以在训练期间方便地划分,并且由于其O(1)的查找模式,在推理时可以高效地放置在主机内存中。在RoboTwin2.0、LIBERO/LIBERO-Plus和现实世界双臂操作中,Key-Gram一致地提高了π₀和π₀.₅骨干,平均相对增益为RoboTwin2.0上的29.5%/9.9%、LIBERO-Plus转移无目标领域微调时的35.8%/4.5%以及现实世界长周期任务上的15.4%/8.1%。这些结果表明,外部化的语言记忆提供了一种有效的、可扩展的机制,以提高组合基础、迁移和现实世界操作性能。

英文摘要

Embodied control increasingly requires models to follow compositional language instructions while reasoning over dynamic visual states. However, current vision-language-action policies and world-action models often couple linguistic knowledge with visual computation in a shared backbone or conditioning pathway, leading to modality competition and making knowledge extension dependent on backbone updates. In this paper, we introduce Key-Gram, a conditional-memory framework that separates language-derived world knowledge from visual-state reasoning for embodied control. At its core is a memory module that decomposes an instruction into task-specific key-grams, retrieves static linguistic priors through deterministic hashed lookup, and injects the retrieved entries into selected hidden layers through context-aware gating and lightweight convolutional fusion. This design allows the backbone to devote its main capacity to visual reasoning and action inference, while reusable instruction knowledge is stored in an extensible external memory. The logical memory table can be conveniently partitioned during training and, due to its $O(1)$ lookup pattern, efficiently placed on host memory during inference. Across RoboTwin2.0, LIBERO/LIBERO-Plus, and real-world dual-arm manipulation, Key-Gram consistently improves both $π_{0}$ and $π_{0.5}$ backbones, with average relative gains of $29.5\%/9.9\%$ on RoboTwin2.0, $35.8\%/4.5\%$ on LIBERO-Plus transfer without target-domain fine-tuning, and $15.4\%/8.1\%$ on real-world long-horizon tasks. These results demonstrate that externalized linguistic memory provides an effective and extensible mechanism for improving compositional grounding, transfer, and real-world manipulation.

2605.18554 2026-05-19 cs.LG stat.ML

Federated Martingale Posterior Samping

联邦马尔可夫后验采样

Boning Zhang, Matteo Zecchin, Mingzhao Guo, Dongzhu Liu, Osvaldo Simeone

AI总结 本文提出联邦马尔可夫后验采样方法,通过在不共享本地数据集的情况下,利用预测分布恢复参数不确定性,从而在联邦学习中提升模型校准性能。

详情
Comments
5 pages
AI中文摘要

联邦贝叶斯神经网络需要在模型参数上固定先验分布和似然函数。在现代过度参数化模型的权重空间上提取有意义的先验分布非常困难,且任一组件的不准确都会严重降低准确性和校准性。受预测模型(如大语言模型)快速发展的启发,马尔可夫后验(也称为预测贝叶斯)用预测分布替代先验-似然对,并通过反复绘制预测样本和重新拟合模型来恢复参数不确定性。然而,直接实现联邦版本需要客户端共享本地数据集。本文提出联邦马尔可夫后验(FMP)采样,是一种单次 embarrassingly parallel 协议,其中每个客户端上传一小组可训练的数据嵌入,服务器在中心运行预测采样器。在MNIST、CIFAR-10和CIFAR-100上的实验表明,FMP与集中式方法高度匹配,并在共识式基线之上显著提升校准性。

英文摘要

Federated Bayesian neural networks require fixing a prior on the model parameters together with a likelihood. Eliciting meaningful priors on the weight space of modern overparameterized models is notoriously difficult, and misspecification of either component can severely degrade accuracy and calibration. Motivated by the rapid progress of predictive models such as large language models, the martingale posterior, also known as predictive Bayes, replaces the prior--likelihood pair with a predictive distribution and recovers parameter uncertainty by repeatedly drawing predictive samples and refitting the model. A direct federated implementation, however, would require clients to share the local data sets. This letter proposes {federated martingale posterior} (FMP) sampling, a one-shot embarrassingly parallel protocol in which each client uploads a small set of trainable data embeddings and the server runs the predictive sampler centrally. Experiments on MNIST, CIFAR-10, and CIFAR-100 show that FMP closely matches the centralized counterpart and significantly improves calibration over consensus-style baselines.

2605.18553 2026-05-19 cs.CV cs.AI

StableHand: Quality-Aware Flow Matching for World-Space Dual-Hand Motion Estimation from Egocentric Video

StableHand: 世界空间双臂运动估计中的质量感知流匹配

Huajian Zeng, Chaohua Yao, Yuantai Zhang, Jiaqi Yang, Rolandos Alexandros Potamias, Xingxing Zuo

AI总结 本文提出StableHand,一种质量感知的流匹配框架,用于从第一人称视频中恢复世界空间双臂的4D运动,通过分解手部姿态估计器提取的观测质量为四个通道,并利用学习的质量网络预测质量信号,以提高运动估计的鲁棒性。

详情
Comments
Project Page: https://huajian-zeng.github.io/projects/stablehand/
AI中文摘要

从第一人称视频中恢复世界空间中两个交互手的4D运动是监督机器人策略学习的基本能力,其中手腕轨迹跟踪末端执行器,手指运动规格化抓取姿态。在此设置中存在两个主要挑战:由于头部运动,手经常长时间离开摄像机视野,且持续的手-物体相互作用导致一个或两个手的严重遮挡。现有方法统一地基于噪声手运动观测,而不考虑其每帧的可靠性,导致性能显著下降。我们的关键见解是,准确的世界空间手运动估计与每帧手部观测的质量紧密相关。为此,我们将从现成的手部姿态估计器中提取的手部运动观测的质量分解为四个通道:双臂的手腕全局平移和手指运动。我们提出StableHand,一种质量感知的流匹配框架,其条件于这些四个通道的质量信号,这些信号由学习的质量网络预测。我们通过每通道的前向调度、质量调整的速度目标、AdaLN调制的DiT去噪器以及质量感知ODE初始化,自然地将质量信号整合到流匹配过程中。这种统一的生成过程在保持高质量观测的同时,利用学习的双臂运动先验重构不可靠的观测。在HOT3D和ARCTIC两个具有长缺失手跨度和持续手-物体遮挡的第一人称基准上,实验表明,StableHand在所有报告的指标上均达到最先进的性能,与最强基线相比,将W-MPJPE减少20-25%,在严重遮挡的ARCTIC序列上最大收益最明显。

英文摘要

Recovering world space 4D motion of two interacting hands from egocentric video is a fundamental capability for supervising robot policy learning, where wrist trajectories track the end-effector and finger articulations specify the grasp pose. Two major challenges arise in this setting: hands frequently leave the camera view for extended periods due to head motion, and persistent hand-object interactions cause severe occlusions of one or both hands. Existing methods uniformly condition on noisy hand motion observations without accounting for their per-frame reliability, leading to substantial performance degradation. Our key insight is that accurate world space hand motion estimation is tightly coupled with the quality of per-frame hand observations. To this end, we decompose the quality of hand motion observations extracted from an off-the-shelf hand pose estimator into four channels: wrist global translation and finger articulations for both hands. We propose StableHand, a quality-aware flow-matching framework conditioned on these four-channel quality signals, which are predicted by a learned quality network. We naturally incorporate the quality signals into the flow-matching process through a per-channel forward schedule, a quality-adjusted velocity target, AdaLN modulation of the DiT denoiser, and a quality-aware ODE initialization. This unified generative process preserves high-quality observations while reconstructing unreliable ones using a learned bimanual motion prior. Experiments on HOT3D and ARCTIC, two egocentric benchmarks featuring long missing-hand spans and persistent hand-object occlusions, show that StableHand achieves state-of-the-art performance across all reported metrics, reducing W-MPJPE by 20-25% compared to the strongest baseline, with the largest gains on heavily occluded ARCTIC sequences.

2605.18552 2026-05-19 cs.LG q-bio.BM q-bio.QM

Protein Fold Classification at Scale: Benchmarking and Pretraining

大规模蛋白质折叠分类:基准测试与预训练

Dexiong Chen, Andrei Manolache, Mathias Niepert, Karsten Borgwardt

AI总结 本文提出TEDBench,一个大规模非冗余的蛋白质折叠分类基准,通过Encyclopedia of Domains和Foldseek-clustered AlphaFold结构构建。基于此基准,作者提出Masked Invariant Autoencoders (MiAE)框架,通过高掩码率和SE(3)不变编码器实现蛋白质结构表示学习,从而在TEDBench上取得优异性能。

详情
Comments
Accepted at ICML 2026 (spotlight)
AI中文摘要

对蛋白质拓扑进行分类对于解析生物学功能至关重要,但进展受限于缺乏大规模基准和无法扩展的模型。我们引入TEDBench,一个大规模、非冗余的蛋白质折叠分类基准,由Encyclopedia of Domains (TED)和Foldseek-clustered AlphaFold结构构建。我们证明在TEDBench上,当前的蛋白质表示学习方法要么需要非常大的模型,要么无法提供强大的性能。为解决这一挑战,我们提出了Masked Invariant Autoencoders (MiAE),一种自监督的蛋白质结构表示学习框架。MiAE使用高达90%的高掩码率,结合SE(3)-不变编码器和轻量级解码器,从潜在表示和掩码标记中重建骨架坐标。MiAE具有良好的扩展性,并在TEDBench上优于监督方法和最先进的基线,建立了蛋白质折叠分类的强大配方。为了测试超越AlphaFold结构的迁移能力,我们进一步在CATH v4.4的实验结构数据集上进行基准测试。TEDBench可在https://github.com/BorgwardtLab/TEDBench获取。

英文摘要

Classifying protein topology is essential for deciphering biological function, but progress is held back by the lack of large-scale benchmarks that avoid duplicates and by models that do not scale well. We introduce TEDBench, a large-scale, non-redundant benchmark for protein fold classification constructed from the Encyclopedia of Domains (TED) and Foldseek-clustered AlphaFold structures. We show that on TEDBench, current protein representation learning methods either require very large models or fail to deliver strong performance. To address this challenge, we propose Masked Invariant Autoencoders (MiAE), a self-supervised framework for protein structure representation learning. MiAE uses an extremely high masking ratio of up to 90% with an $\mathrm{SE(3)}$-invariant encoder and a lightweight decoder that reconstructs backbone coordinates from the latent representation and mask tokens. MiAE scales well and outperforms supervised counterparts and state-of-the-art baselines on TEDBench, establishing a strong recipe for protein fold classification. To test transfer beyond AlphaFold structures, we further benchmark on a curated dataset from experimental structures of CATH v4.4. TEDBench is available at https://github.com/BorgwardtLab/TEDBench.

2605.18549 2026-05-19 cs.CL cs.CR

Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics

监控内部独白:探测轨迹揭示推理动态

Maciej Chrabąszcz, Aleksander Szymczyk, Marcin Sendera, Tomasz Trzciński, Sebastian Cygert

AI总结 本文研究了大型推理模型的内部推理动态,通过分析探测轨迹来提高监控可靠性,提出了一种基于信号处理特征的方法,显著提升了未来模型状态的分离能力。

详情
AI中文摘要

大型推理模型(LRMs)通过其思维链(CoT)推理引入了安全监控的新机遇。然而,CoT并不总是忠实于模型的最终输出,削弱了其作为监控工具的可靠性。为了解决这个问题,我们研究了LRMs的隐藏表示,以确定是否可以从提示和CoT表示中预测未来行为。通过在每个生成的token上评估探测器,我们构建了探测轨迹,即概念概率在推理过程中的连续演变。我们发现,当在完整的轨迹上检查时,未来模型行为比从单一静态预测中更易于区分。为了表征这些时间动态,我们提取了信号处理特征,捕捉波动性、趋势和稳态行为,显著提高了未来模型状态的分离。我们还提出了两种方法论见解。首先,基于模板的训练数据在动态生成模型响应方面几乎达到同等水平,消除了对昂贵初始推理和标记的需要。其次,池化操作的选择至关重要:平均池化和最后token方法退化到接近随机的性能,而最大池化在95%的AUROC上取得优异成绩,并产生稳定的探测轨迹。使用四个数据集和四个跨安全和数学领域的推理模型,我们证明轨迹特征编码了任务特定的动态,提高了结果分离度。这些发现确立了探测轨迹作为监控LRM行为的互补框架。

英文摘要

Large Reasoning Models (LRMs) introduce new opportunities for safety monitoring through their Chain of Thought (CoT) reasoning. However, CoT is not always faithful to the model's final output, undermining its reliability as a monitoring tool. To address this, we investigate the hidden representations of LRMs to determine whether future behavior can be predicted from prompt and CoT representations. By evaluating a probe at each generated token, we construct a probe trajectory, the continuous evolution of a concept's probability across the reasoning process. We find that future model behavior is more distinguishable when examined over the full trajectory than from a single static prediction. To characterize these temporal dynamics, we extract signal-processing features that capture volatility, trend, and steady-state behavior, significantly improving the separation of future model states. We also present two methodological insights. First, template-based training data achieves near-parity with dynamically generated model responses, eliminating the need for a costly initial inference and labeling. Second, the choice of pooling operation is critical: average-pooling and last-token methods collapse to near-random performance, while max-pooling achieves up to 95% AUROC and yields stable probe trajectories. Using four datasets and four reasoning models across the domains of safety and mathematics, we demonstrate that trajectory features encode task-specific dynamics that improve outcome separability. These findings establish probe trajectories as a complementary framework for monitoring LRM behavior. Warning: This article contains potentially harmful content.

2605.18548 2026-05-19 cs.CL cs.AI

STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics

STT-Arena:一种更现实的工具使用环境,包含时空动态

Tingfeng Hui, Hao Xu, Pengyu Zhu, Hongsheng Xin, Kun Zhan, Sen Su, Chunxiao Liu, Ning Miao

AI总结 本文提出STT-Arena基准测试,旨在评估大型语言模型在面对时空动态变化时的适应性规划能力,发现现有模型在处理此类动态问题时存在显著不足,并提出改进方法STT-Agent-4B以提升性能。

详情
Comments
Work in progress
AI中文摘要

大型语言模型(LLMs)在现实世界中的代理应用中必须能够重新规划和适应,当任务中途中断时推翻其先前决策。现有的动态基准主要测量LLMs是否能够及时检测时间变化,留下适应时空动态的互补挑战未被探索。我们介绍了STT-Arena(Spatio-Temporal Tool-Use Arena),一个包含227个高质量交互任务的基准测试,涵盖九种时空冲突类型和四种可解性级别。每个任务都基于一个现实、可执行的环境,配备注入的时空触发器,可以突然使正在进行的计划失效,迫使模型检测状态变化并构建修订的执行策略。对前沿LLMs的广泛评估显示,即使是最先进的专有模型,如Claude-4.6-Opus,也只达到低于40%的总体准确率,突显了时空动态推理的根本难度。对失败轨迹的系统分析揭示了现有模型的三种反复出现的错误模式:停滞状态执行、动态触发器的误诊断和缺失的适应后验证。基于这些发现,我们提出了一种迭代轨迹细化技术,消除这些失败模式,结合在线强化学习,产生STT-Agent-4B,其在STT-Arena上优于前沿LLMs。

英文摘要

Large language models (LLMs) deployed in real-world agentic applications must be capable of replanning and adapting when mid-task disruptions invalidate their prior decisions. Existing dynamic benchmarks primarily measure whether LLMs can detect temporal changes in a timely manner, leaving the complementary challenge of adaptive replanning under spatio-temporal dynamics largely unexplored. We introduce STT-Arena (Spatio-Temporal Tool-Use Arena), a benchmark of 227 high-quality interactive tasks spanning nine spatio-temporal conflict types and four solvability levels. Each task is grounded in a realistic, executable environment equipped with injected spatio-temporal triggers that can abruptly invalidate an ongoing plan, forcing the model to detect the state shift and construct a revised execution strategy. Extensive evaluation of frontier LLMs reveals that even the SOTA proprietary models, including Claude-4.6-Opus, achieves less than 40\% overall accuracies, highlighting the fundamental difficulty of spatio-temporal dynamic reasoning. Systematic analysis of failure trajectories uncovers three recurring error modes of existing models: Stale-State Execution, Misdiagnosis of Dynamic Triggers, and Missing Post-Adaptation Verification. Guided by these findings, we propose an iterative trajectory refinement technique that eliminates these failure patterns from training data, and combine it with online RL to produce STT-Agent-4B which outperforms frontier LLMs on STT-Arena.

2605.18547 2026-05-19 cs.AI

VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation

VISAFF: 以说话者为中心的视觉情感特征学习用于对话中的情感识别

Linan ZHU, Zihao Zhai, Xiao Han, Yuqian Fu, Xiangfan Chen, Xiangjie Kong, Guojiang Shen

AI总结 本文提出VISAFF框架,通过以说话者为中心的视觉情感特征学习方法,解决对话中情感识别中的复杂场景问题,提升计算效率并避免大规模模型微调的高成本。

详情
AI中文摘要

对话中情感识别(ERC)对于有效的人机交互至关重要,旨在识别多轮对话中说话者的情感状态。早期基于文本的方法在处理如讽刺等复杂场景时存在困难,因为它们本质上忽略了关键的非语言信息。尽管最近的视觉-语言模型(VLMs)通过直接分析视频来解决这一问题,但它们并非专门为ERC量身定制,通常关注与情感无关的背景区域或被动听众,而非活跃说话者。此外,微调这些大模型会带来高昂的计算成本。此外,孤立的视觉信号在缺乏语言内容和语音语调的上下文时往往模糊或技术上受损。为了解决这些挑战,我们提出了VISAFF,一个以说话者为中心的视觉情感特征学习框架用于ERC。VISAFF包括两个阶段:说话者中心的情感定位和可靠性引导的情感补充。VISAFF采用无微调的方法来解锁冻结的VLMs的推理能力,高效地引导它们专注于活跃说话者的情感视觉线索,而无需沉重的训练开销。在第二阶段,我们引入了可靠性引导的情感补充机制,动态利用文本和声音模态来补偿视觉不确定性。在两个真实世界数据集上的实验表明,VISAFF在无微调设置下实现了与最先进方法相媲美的性能,显著提高了计算效率,通过消除对大规模VLMs昂贵微调的需要。源代码可在https://anonymous.4open.science/r/speaker-2365/上获得。

英文摘要

Emotion Recognition in Conversation (ERC) is essential for effective human-machine interaction, aiming to identify speakers' emotional states in multi-turn dialogues. Early text-based methods struggle with complex scenarios like sarcasm because they inherently neglect vital non-verbal information. While recent Vision-Language Models (VLMs) address this by analyzing video directly, they are not inherently tailored for ERC and often focus on emotionally irrelevant background regions or passive listeners rather than the active speaker. Furthermore, fine-tuning these large models incurs prohibitive computational costs. Additionally, isolated visual signals are frequently ambiguous or technically compromised without the context of linguistic content and vocal prosody. To address these challenges, we propose VISAFF, a speaker-centered VISual AFFective feature learning framework for ERC. VISAFF consists of two stages: Speaker-Centered Affective Grounding and Reliability-Guided Affective Complementation. VISAFF utilizes a tuning-free approach to unlock the reasoning capabilities of frozen VLMs, efficiently steering them to focus on the active speaker's emotional visual cues without heavy training overheads. In the second stage, we introduce a reliability-guided affective complementation mechanism that dynamically leverages textual and acoustic modalities to compensate for visual uncertainty. Experiments on two real-world datasets demonstrate that VISAFF achieves highly competitive performance compared to state-of-the-art methods in a tuning-free setting, significantly enhancing computational efficiency by eliminating the need for expensive fine-tuning of large VLMs. The source code is available at https://anonymous.4open.science/r/speaker-2365/.

2605.18543 2026-05-19 cs.RO

Geometry-Aware Surrogate for Real-Time Hydrodynamics Estimation of Autonomous Ground Vehicles in Amphibious Environments

面向水下环境的自主地面车辆实时流体动力学估计几何感知代理

Ammar Waheed, Luke Gallantree, Zohaib Hasnain

AI总结 本文提出了一种基于神经网络的几何感知代理,用于在水下环境中实时估计自主地面车辆的流体动力学,通过高保真CFD数据训练,实现了对车辆几何、深度和水流方向的准确预测,展示了在真实环境中的应用效果。

详情
AI中文摘要

在浅水或易发洪水的地形中运行的自主地面车辆需要能够考虑流体动力学力的动态模型。然而,目前可用的仿真和规划工具要么缺乏物理真实性,要么计算成本过高,无法实时运行。本文提出了一种针对不同表面的神经网络代理,通过在高保真CFD数据上训练,预测实时速率下的几何解析流体动力学力。车辆特定的符号距离场(SDF)提供每表面的淹没输入,使模型能够解析负载如何随车辆几何、深度和水流方向变化。在留出的CFD数据上,代理实现了纵向力对称MAPE(sMAPE)为13%,垂直力sMAPE为3-12%,推理时间每样本小于0.9毫秒。为了在真实世界条件下评估模型,使用全尺寸车辆在不同淹没深度下的涉水试验。运动捕捉推导的运动学作为代理输入,所得预测用于重现已知的力、速度和深度之间的物理关系。预测的阻力遵循二次速度缩放(R²≥0.97),浮力截距与深度线性相关(R²=0.973)。这两种关系未在模型训练损失中编码,但源自每表面架构中单独预测的表面力总和。所得到的框架为将物理基础的流体动力学嵌入自主地面车辆依赖的 amphibious 环境仿真和规划循环提供了路径。

英文摘要

Autonomous ground vehicles operating in shallow water or flood-prone terrains require dynamic models that account for hydrodynamic forces. However, the simulation and planning tools currently available either lack the physical fidelity or are too computationally expensive to run in real time. This work presents a per-surface neural network surrogate that bridges this gap by predicting geometry-resolved hydrodynamic forces at real-time rates, trained entirely on high-fidelity CFD data from two geometrically distinct vehicles. A vehicle specific Signed Distance Field (SDF) provides per-surface submergence inputs, allowing the model to resolve how loading varies with vehicle geometry, depth, and flow direction. On held-out CFD data, the surrogate achieves a longitudinal-force symmetric MAPE (sMAPE) of 13\% and a vertical-force sMAPE of 3-12\%, with inference running under 0.9\,ms per sample. To evaluate the model under real-world conditions, water wading trials of a full-scale vehicle at different submersion depths are used. Motion capture derived kinematics serve as the surrogate inputs, and the resulting predictions are tested to reproduce known physical relationships between force, speed, and depth. The predicted drag follows quadratic speed scaling ($R^2 \geq 0.97$) and the buoyancy intercepts scale linearly with depth ($R^2 = 0.973$). Neither relationship is encoded in the model training loss, both emerge from the per-surface architecture summing individually predicted surface forces. The resulting framework provides a pathway for embedding physically grounded hydrodynamics into the simulation and planning loops that autonomous ground vehicles depend on in amphibious environments.

2605.18541 2026-05-19 cs.CV

LESSViT: Robust Hyperspectral Representation Learning under Spectral Configuration Shift

LESSViT: 在光谱配置偏移下鲁棒的高光谱表示学习

Haozhe Si, Yuxuan Wan, Yuqing Wang, Minh Do, Han Zhao

AI总结 本文提出LESSViT,一种灵活的跨光谱泛化架构,通过低秩高效空间-光谱ViT,解决不同传感器下的高光谱图像建模问题,提升鲁棒性和效率。

详情
AI中文摘要

对不同传感器的高光谱图像(HSI)建模面临波长覆盖、波段采样和通道维度的变化带来的基本挑战。因此,基于固定光谱配置训练的模型往往无法泛化到其他传感器。现有的Vision Transformer(ViT)方法要么依赖于隐式光谱建模和固定通道假设,要么采用显式的空间-光谱注意力机制,但计算成本过高,导致效率与表达能力之间存在根本性的权衡。在本文中,我们引入了低秩高效空间-光谱ViT(LESSViT),一种用于跨光谱泛化的灵活架构。LESSViT基于LESS注意力,一种结构化的低秩因子分解,通过可分离的空间和光谱组件建模联合空间-光谱交互,将全空间-光谱注意力的复杂度从O(N²C²)降低到O(rNC),其中N是空间标记的数量,C是光谱通道的数量,r是低秩近似等级。我们进一步结合通道无关的补丁嵌入和波长感知的位置编码,以支持灵活的光谱输入。为了实现高效且稳健的预训练,我们引入了高光谱掩码自编码器(HyperMAE),具有解耦的空间-光谱掩码和分层通道采样。我们在跨光谱泛化设置下评估LESSViT,该设置模拟了跨传感器变化。在SpectralEarth基准测试中,实验表明LESSViT在光谱偏移下提高了鲁棒性,同时在分布内保持竞争力,显式且高效的空间-光谱建模对于可扩展和可泛化的高光谱表示学习至关重要。

英文摘要

Modeling hyperspectral imagery (HSI) across different sensors presents a fundamental challenge due to variations in wavelength coverage, band sampling, and channel dimensionality. As a result, models trained under a fixed spectral configuration often fail to generalize to other sensors. Existing Vision Transformer (ViT) approaches either rely on implicit spectral modeling with fixed channel assumptions or adopt explicit spatial-spectral attention with prohibitive computational cost, leading to a fundamental trade-off between efficiency and expressiveness. In this work, we introduce Low-rank Efficient Spatial-Spectral ViT (LESSViT), a sensor-flexible architecture for cross-spectral generalization. LESSViT is built on LESS Attention, a structured low-rank factorization that models joint spatial-spectral interactions through separable spatial and spectral components, reducing the complexity of full spatial-spectral attention from $O(N^2 C^2)$ to $O(rNC)$, where $N$ is the number of spatial tokens, $C$ is the number of spectral channels, and $r$ is the rank of the low-rank approximation. We further incorporate channel-agnostic patch embedding and wavelength-aware positional encoding to support flexible spectral inputs. To enable efficient and robust pretraining, we introduce a hyperspectral masked autoencoder (HyperMAE) with decoupled spatial-spectral masking and hierarchical channel sampling. We evaluate LESSViT under a cross-spectral generalization setting that simulates cross-sensor variability. Experiments on the SpectralEarth benchmark demonstrate that LESSViT improves robustness under spectral shifts while remaining competitive in-distribution, and explicit and efficient spatial-spectral modeling is essential for scalable and generalizable hyperspectral representation learning.

2605.18537 2026-05-19 cs.LG cs.AI stat.ML

Probing for Representation Manifolds in Superposition

在叠加中探测表示流形

Alexander Modell

AI总结 本文提出Manifold Probe方法,用于发现叠加中的表示流形,通过学习可线性预测的特征空间以及编码方向,从而揭示模型行为中因果相关的流形。

详情
Comments
19 pages, 7 figures
AI中文摘要

本文介绍了一个名为Manifold Probe的监督方法,用于在叠加中发现表示流形。该方法通过学习一个概念的特征空间,该空间可以线性预测自表示,然后学习用于编码这些特征的方向。我们展示了该方法在Llama 2-7b中时间与空间的表示上,发现每个案例中都能线性表示可解释的特征集合。在时间案例中,我们展示了通过沿流形引导,可以影响模型对著名歌曲、电影和书籍发布年份的完成,提供了证据表明Manifold Probe能够发现与模型行为因果相关的流形。

英文摘要

This paper introduces the Manifold Probe, a supervised method for discovering representation manifolds in superposition. The method generalizes linear regression probes by learning the space of features of a concept that can be linearly predicted from the representations, and then learning the directions used to encode them. We demonstrate the probe on representations of time and space in Llama 2-7b, finding manifolds which linearly represent an interpretable set of features in each case. In the case of time, we show that by steering along the manifold, we can influence the model's completions about the years in which famous songs, movies and books were released, providing evidence that the Manifold Probe can discover manifolds which are causally involved in model behaviour.

2605.18535 2026-05-19 cs.LG cs.MA

Beyond Scaling: Agents Are Heading to the Edge

超越扩展:智能体正走向边缘

Chunlin Tian, Dongqi Cai, Wanru Zhao, Nicholas D. Lane

AI总结 本文探讨了智能体技术发展的瓶颈从单一模型压缩世界知识转向协调系统执行,提出个人智能体架构必须转向边缘计算,以适应高保真局部环境的结构耦合和零延迟执行循环需求。

详情
AI中文摘要

有用智能体智能的瓶颈已从将世界知识压缩到单一模型转变为执行协调系统。本文主张个人智能体架构必须走向边缘,因为智能体任务的核心特性,特别是其与高保真局部环境的结构耦合以及对零延迟执行循环的需求,无法与以云为中心的设计兼容。我们通过三个结构性转变来支持这一主张。首先,前额转变:能力的主要边际杠杆已从预训练规模转移到框架级执行控制。此类控制必须保持与行动环境的物理接近,以确保智能体保持认知一致性。其次,数据地理悖论,智能体数据的“暗物质”(本地文件层次结构、实时传感器流和瞬态操作系统状态)在准备传输到云时会退化、消失或失去意义,从而切断智能体与真实环境上下文的联系。第三,交互对齐循环,唯一经济和生态可持续的智能体细化数据来源是通过实时本地交互产生的高保真隐含偏好信号。我们最后提出可检验的预测,用于个人智能体的下一次部署周期。

英文摘要

The bottleneck of useful agentic intelligence has shifted from compressing world knowledge into a single model to executing a coordinated system. This position paper argues that personal-agent architecture must move to the edge because the core properties of agentic intelligence tasks, particularly their structural coupling with high-fidelity local context and the need for zero-latency execution loops, do not sit well with cloud-centric designs. We develop this claim through three structural shifts. First, the Prefrontal Turn: the main marginal lever of capability has moved from pre-training scale to framework-level executive control. Such control must remain physically close to the environment of action if the agent is to preserve cognitive alignment. Second, the Data-Geography Paradox, the ``dark matter'' of agentic data (local file hierarchies, real-time sensor streams, and transient OS states) degrades, disappears, or loses meaning once prepared for cloud transmission, thereby cutting the agent off from ground-truth context. Third, the interaction-alignment loop, the only economically and ecologically sustainable source of agentic refinement data is the high-fidelity implicit preference signal produced through real-time local interaction. Third, the interaction-alignment loop, the only economically and ecologically sustainable source of agentic refinement data is the high-fidelity implicit preference signal produced through real-time local interaction. We conclude with falsifiable predictions for the next deployment cycle of personal agents.

2605.18534 2026-05-19 cs.LG

XCTFormer: Leveraging Cross-Channel and Cross-Time Dependencies for Enhanced Time-Series Analysis

XCTFormer: 利用跨通道和跨时间依赖性提升时间序列分析

Israel Zexer, Omri Azencot

AI总结 本文提出XCTFormer模型,通过增强的注意力机制显式捕捉时间序列中的跨时间与跨通道依赖性,以提升时间序列分析性能,特别是在缺失值填补任务中取得state-of-the-art结果。

详情
Comments
TMLR 2026
AI中文摘要

多变量时间序列分析涉及从多个相互依赖变量的序列中提取信息性表示,支持预测、填补和异常检测等任务。在现实场景中,这些变量通常来自共享上下文或底层现象,表明存在时间与通道间的潜在依赖性,可以利用以提高性能。然而,最近的研究发现,假设无变量间依赖性的通道独立(CI)模型往往优于显式建模此类关系的通道依赖(CD)模型。这一意外结果表明,当前CD模型可能由于依赖性捕捉的限制而未能充分发挥潜力。最近的研究重新审视了通道依赖建模,但这些方法通常采用间接建模策略,可能导致有意义的依赖性被忽视。为了解决这个问题,我们引入了XCTFormer,一种基于Transformer的通道依赖(CD)模型,通过增强的注意力机制显式捕捉跨时间和跨通道依赖性。该模型以token到token的方式操作,建模时间与通道之间每对token之间的成对依赖性。架构包括(i)数据处理模块,(ii)新型的跨关系注意力块(CRAB),以增加容量和表达性,以及(iii)可选的依赖压缩插件(DeCoP),以提高可扩展性。通过在三个时间序列基准上的广泛实验,我们证明XCTFormer在与广泛认可的基线相比时取得了强劲的结果;特别是,在填补任务中,它在MSE和MAE上分别比第二好的方法平均高出20.8%和15.3%。

英文摘要

Multivariate time-series analysis involves extracting informative representations from sequences of multiple interdependent variables, supporting tasks such as forecasting, imputation, and anomaly detection. In real-world scenarios, these variables are typically collected from a shared context or underlying phenomenon, suggesting the presence of latent dependencies across time and channels that can be leveraged to improve performance. However, recent findings show that channel-independent (CI) models, which assume no inter-variable dependencies, often outperform channel-dependent (CD) models that explicitly model such relationships. This surprising result indicates that current CD models may not fully exploit their potential due to limitations in how dependencies are captured. Recent studies have revisited channel dependence modeling with various approaches; however, these methods often employ indirect modeling strategies, which can lead to meaningful dependencies being overlooked. To address this issue, we introduce XCTFormer, a transformer-based channel-dependent (CD) model that explicitly captures cross-temporal and cross-channel dependencies via an enhanced attention mechanism. The model operates in a token-to-token fashion, modeling pairwise dependencies between every pair of tokens across time and channels. The architecture comprises (i) a data processing module, (ii) a novel Cross-Relational Attention Block (CRAB) that increases capacity and expressiveness, and (iii) an optional Dependency Compression Plugin (DeCoP) that improves scalability. Through extensive experiments on three time-series benchmarks, we show that XCTFormer achieves strong results compared to widely recognized baselines; in particular, it attains state-of-the-art performance on the imputation task, outperforming the second-best method by an average of 20.8% in MSE and 15.3% in MAE.

2605.18530 2026-05-19 cs.CL cs.AI cs.LG stat.ML

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

连续扩散在语言领域中能与离散扩散竞争性地扩展

Zhihan Yang, Wei Guo, Shuibai Zhang, Subham Sekhar Sahoo, Yongxin Chen, Arash Vahdat, Morteza Mardani, John Thickstun

AI总结 本文研究了连续扩散模型在语言建模中的扩展能力,通过改进Plaid模型构建RePlaid,证明连续扩散模型在计算效率和性能上可与离散模型竞争,并提供了理论支持。

详情
AI中文摘要

尽管扩散模型近期在语言建模领域受到广泛关注,但连续扩散模型在扩展性方面似乎不如离散方法。为了挑战这一观点,我们重新审视Plaid,一种基于似然的连续扩散语言模型(DLM),并构建RePlaid,通过将Plaid的架构与现代离散DLMs对齐。在统一的设定下,我们建立了第一个连续DLMs的扩展定律,表明RePlaid的计算差距仅为自回归模型的20倍,使用更少的参数优于Duo,并在过训练范围内优于MDLM。我们将RePlaid与最近的连续DLMs进行基准测试:在OpenWebText上,RePlaid实现了连续DLMs中的新状态-of-the-art PPL界值为22.1,并在生成质量上更优。这些结果表明,当通过似然训练时,连续扩散是与离散DLMs高度竞争且可扩展的替代方案。此外,我们提供了理论见解以理解基于似然训练的优势。我们展示了优化噪声调度以最小化ELBO的方差自然会得到时间上的线性交叉熵(信息损失)。这均匀地分配去噪难度,而无需任何特定时间的重参数化。此外,我们发现通过似然优化嵌入会创建结构化的几何形状并驱动最大的似然增益。

英文摘要

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous diffusion language model (DLM), and construct RePlaid by aligning the architecture of Plaid with modern discrete DLMs. In this unified setting, we establish the first scaling law for continuous DLMs that rivals discrete DLMs: RePlaid exhibits a compute gap of only $20\times$ compared to autoregressive models, outperforms Duo while using fewer parameters, and outperforms MDLM in the over-trained regime. We benchmark RePlaid against recent continuous DLMs: on OpenWebText, RePlaid achieves a new state-of-the-art PPL bound of $22.1$ among continuous DLMs and superior generation quality. These results suggest that continuous diffusion, when trained via likelihood, is a highly competitive and scalable alternative to discrete DLMs. Moreover, we offer theoretical insights to understand the advantage of likelihood-based training. We show that optimizing the noise schedule to minimize the ELBO's variance naturally yields linear cross-entropy (information loss) over time. This evenly distributes denoising difficulty without any case-specific time reparameterization. In addition, we find that optimizing embeddings via likelihood creates structured geometries and drives the most significant likelihood gain.

2605.18529 2026-05-19 cs.AI

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

AMR-SD:不对称元反射自蒸馏用于标记级信用分配

Zhenlin Wei, Pu Jian, Yingzhuo Deng, Xiaohan Wang, Jiajun Chai, Zhexin Hu, Wei Lin, Shanbin Zhang, Guojun Yin

AI总结 本文提出AMR-SD,一种不对称元反射自蒸馏方法,旨在解决大型语言模型在复杂推理中因统一序列奖励导致的信用分配瓶颈问题,通过引入反思瓶颈和因果信息增益机制,实现更精确的标记级优势调节。

详情
AI中文摘要

大型语言模型(LLM)在复杂推理中的对齐严重依赖于可验证奖励强化学习(RLVR)。然而,标准算法如GRPO将序列级奖励均匀应用于所有标记,造成严重的信用分配瓶颈。尽管在线自我蒸馏试图通过将自我教师条件于特权上下文来解决这一问题,但直接暴露于原始 oracle 解决方案往往会诱导过条件的教师分布、隐含答案泄漏和晚期训练崩溃。为了克服这些限制,我们提出了不对称元反射自蒸馏(AMR-SD)。与直接条件于原始参考轨迹不同,AMR-SD插入了一个反思瓶颈:它将来自验证者结果、同伴回放或参考反馈的诊断信号压缩成简洁的自我生成苏格拉底提示和批评。此外,我们引入了因果信息增益(CIG),其具有不对称的ReLU门控阈值,用于将这些反思转换为稀疏、高精度的标记级优势调节。结合时间退火,这种机制在保持基础环境奖励的同时过滤掉分布噪声。在科学、数学和工具使用基准上的实验表明,AMR-SD显著优于现有基线,实现了稳健的长距离稳定性,并成功防止了晚期崩溃。

英文摘要

The alignment of Large Language Models (LLMs) for complex reasoning heavily relies on Reinforcement Learning with Verifiable Rewards (RLVR). However, standard algorithms like GRPO apply sequence-level rewards uniformly to all tokens, creating a severe credit-assignment bottleneck. While on-policy self-distillation attempts to resolve this by conditioning a self-teacher on privileged contexts, direct exposure to raw oracle solutions often induces over-conditioned teacher distributions, implicit answer leakage, and late-stage training collapse. To overcome these limitations, we propose Asymmetric Meta-Reflective Self-Distillation (AMR-SD). Instead of conditioning directly on raw reference traces, AMR-SD inserts a reflection bottleneck: it compresses diagnostic signals -- from verifier outcomes, peer rollouts, or reference feedback -- into concise, self-generated Socratic hints and critiques. Furthermore, we introduce Causal Information Gain (CIG) with an asymmetric, ReLU-gated threshold to translate these reflections into sparse, highly precise token-level advantage modulations. Combined with temporal annealing, this mechanism preserves the base environmental reward while filtering out distributional noise. Experiments across scientific, mathematical, and tool-use benchmarks demonstrate that AMR-SD significantly outperforms existing baselines, achieving robust long-horizon stability and successfully preventing late-stage collapse.

2605.18522 2026-05-19 cs.CV cs.AI cs.LG

Beyond Morphology: Quantifying the Diagnostic Power of Color Features in Cancer Classification

超越形态学:量化颜色特征在癌症分类中的诊断能力

Farnaz Kheiri, Shahryar Rahnamayan, Masoud Makrehchi

AI总结 本文研究了颜色特征在癌症分类中的诊断能力,通过排除形态学信息,评估了全局颜色特征的判别力,发现颜色特征在二分类任务中可达到高达89%的准确率,表明颜色分布包含非随机的诊断信号。

详情
AI中文摘要

在组织病理学中,人类专家主要依靠颜色增强对比度来解读组织形态,而机器视觉模型则将颜色视为原始统计信息。这一区别提出了一个根本性问题:像素强度本身,独立于结构和形态学线索,能支持多少癌症分类?为了解决这个问题,我们系统评估了全局颜色特征的独立判别力,同时刻意排除所有形态学信息。具体而言,我们提取了统计颜色矩,并对RGB和HSV颜色直方图进行离散化处理,然后在十个不同的实验设置中使用经典机器学习分类器评估其性能。我们的结果表明,在二元诊断任务(例如良性与恶性)中,仅颜色特征即可实现强劲的性能,分类准确率可达到89%。这种性能很可能归因于与恶性相关的全局色度变化。重要的是,这些简单的颜色基表示在很大程度上优于随机基线,表明原始颜色分布编码了非随机且具有诊断意义的信号用于癌症检测。因此,本研究表明,简单的、计算高效的色彩特征可以作为一种有效的预筛选工具。通过识别具有强色度指示恶性特征的样本,这些轻量模型可以作为第一道筛选系统,减少对复杂深度学习架构的计算负担。

英文摘要

In histopathology, human experts primarily rely on color as a means of enhancing contrast to interpret tissue morphology, whereas machine vision models process color as raw statistical information. This distinction raises a fundamental question: to what extent can pixel intensity alone, independent of structural and morphological cues, support cancer classification? To address this question, we systematically evaluated the standalone discriminative power of global color features while deliberately excluding all morphological information. Specifically, we extracted statistical color moments and discretized RGB and HSV color histograms, and assessed their performance across ten diverse experimental settings using classical machine learning classifiers. Our results demonstrate that color features alone can achieve strong performance in binary diagnostic tasks (e.g., benign versus malignant), with classification accuracies reaching up to 89%. This performance is likely attributable to global chromatic shifts associated with malignancy. Importantly, these simple color-based representations consistently outperformed random baselines by a substantial margin, indicating that raw color distributions encode a non-random and diagnostically relevant signal for cancer detection. Consequently, this study suggests that simple, computationally efficient color features can serve as an effective pre-screening tool. By identifying samples with strong chromatic indicators of malignancy, these lightweight models could function as a first-pass triage system, reducing the computational burden on complex deep learning architectures.

2605.18512 2026-05-19 cs.CL

Easier to Judge than to Find: Predicting In-Context Learning Success for Demonstration Selection

更容易判断而非寻找:预测上下文学习成功以选择演示

Haochun Wang, Chaofen Yang, Jiatong Liu, Jingbo Wang, Zewen Qiang, Sendong Zhao, Bing Qin, Ting Liu

AI总结 本文提出DiSP框架,通过分层判断和轻量路由模型,预测上下文学习中演示选择的成功率,从而在多个数据集上实现更高的准确率和更快的推理速度。

详情
Comments
ICML 2026
AI中文摘要

上下文学习(ICL)对提示中出现的演示非常敏感,但选择演示是昂贵的,因为可能的演示上下文和组合空间极大。我们主张演示选择是'更容易判断而非寻找':预测特定查询-上下文对(q,D)是否成功比搜索最优D*更便宜且更通用。基于这一见解,我们提出DiSP,一种样本和判断框架,通过分层查询难度进行分类。DiSP运行随机演示试验以估计每个训练查询的成功率,训练轻量级路由器预测查询难度,并训练针对特定层次的判断器对采样演示进行判断。在推理时,DiSP在显式预算下执行停止接受判断,当未找到合适上下文时会发出诊断风险标签。在五个分类数据集上使用Llama 3-8B和Qwen 2.5-7B,DiSP实现了最佳的平均准确率,比强学习选择基线提高了最高3.4%,同时实现了高达23倍的端到端时间加速。

英文摘要

In-context learning (ICL) is highly sensitive to which demonstrations appear in the prompt, but selecting them is expensive because the space of possible demonstration contexts and combinations is enormous. We argue that demonstration selection is \emph{easier to judge than to find}: predicting whether a specific query--context pair $(q,D)$ will succeed is cheaper and more general than searching for an optimal $D^\star$. Based on this insight, we propose DiSP, a sample-and-judge framework that stratifies queries by difficulty. DiSP runs random demonstration trials to estimate success rate of each training query, trains a lightweight router to predict difficulty from the query, and trains level-specific judges for sampled demonstrations. At inference, DiSP performs stop-on-acceptance judging under an explicit budget, emitting diagnostic risk tags when no suitable context is found. Across five classification datasets with Llama~3--8B and Qwen~2.5--7B, DiSP achieves the best average accuracy, improving over strong learned selection baselines by up to 3.4\%, while achieving up to $23\times$ end-to-end wall-clock speedup.

2605.18511 2026-05-19 cs.AI cond-mat.mtrl-sci eess.SP

A Practical Noise2Noise Denoising Pipeline for High-Throughput Raman Spectroscopy

一种适用于高通量拉曼光谱的实用噪声2噪声去噪流程

David Martin-Calle, Cesar Alvarez Llamas, Vincent Motto- Ros, Christophe Dujardin, Jérémie Margueritat, David Rodney

AI总结 本文提出了一种轻量级且可复现的高通量拉曼光谱去噪流程,采用一维卷积自编码器和噪声2噪声策略进行训练,无需外部光谱库或高信噪比参考光谱。通过重复短曝光采集的简化训练集,模型能够有效抑制随机噪声并重建拉曼光谱。在异质矿物样本上评估结果表明,5ms/谱的积分时间虽通常不足以可靠解释,但能产生高保真度的去噪光谱并保持化学相干性地图。该工作在光谱质量和获取速度之间提供了实用的权衡,使快速适应的拉曼流程适用于常规实验室使用,并为其他一维光谱模式提供了可转移的框架。

详情
AI中文摘要

本文提出了一种轻量级且可复现的高通量拉曼光谱去噪流程。该方法基于一维卷积自编码器,采用噪声2噪声策略进行训练,无需外部光谱库或高信噪比参考光谱。从由重复短曝光采集构成的简化训练子集中,模型学习重建拉曼光谱并高效抑制随机噪声。在异质矿物样本上,该方法使用定量光谱保真度指标(RMSE、SNR、SSIM)和基于无监督K-均值分类的任务导向标准进行评估。结果表明,5ms/谱的积分时间,通常不足以可靠解释,但能产生高保真度的去噪光谱,同时保持化学相干性地图。本工作在光谱质量和获取速度之间提供了实用的权衡,使快速、适应性强的拉曼流程能够与常规实验室使用兼容。此外,该工作还为其他一维光谱模式提供了可转移的框架。

英文摘要

A lightweight and reproducible denoising pipeline for high-throughput Raman spectroscopy is presented. The approach relies on a one-dimensional convolutional autoencoder trained using a Noise2Noise strategy, requiring neither external spectral libraries nor high signal-to-noise reference spectra for training. From a reduced training subset composed of repeated short-exposure acquisitions, the model learns to reconstruct Raman spectra while efficiently suppressing stochastic noise. The method is evaluated on a heterogeneous mineral sample, using both quantitative spectral fidelity metrics (RMSE, SNR, SSIM) and task-oriented criteria based on unsupervised K-means classification. Results demonstrate that integration times as short as 5 ms per spectrum, which are typically insufficient for reliable interpretation, yield denoised spectra with high fidelity to the reference data while preserving chemically coherent maps. This work provides a practical trade-off between spectral quality and acquisition speed, enabling fast, adaptable Raman workflows compatible with routine laboratory use. It also offers a transferable framework for other one-dimensional spectroscopic modalities.

2605.18509 2026-05-19 cs.LG

Offline Contextual Bandits in the Presence of New Actions

离线情境老虎机中存在新动作的情况

Ren Kishimoto, Tatsuhiro Shimizu, Kazuki Kawamura, Takanori Muroi, Yusuke Narita, Yuki Sasamoto, Kei Tateno, Takuma Udagawa, Yuta Saito

AI总结 本文研究了在部署日志策略后引入的新动作对离线情境老虎机(OPL)的影响,提出了一种新的OPL方法,通过局部组合伪逆(LCPI)估计器和Policy Optimization for Effective New Actions(PONA)算法,有效学习和选择新动作,同时保持整体策略性能。

详情
Comments
12pages, 7 figures
AI中文摘要

自动化决策算法驱动推荐系统和搜索引擎等应用。这些算法通常依赖于离线情境老虎机或离线学习(OPL)。传统上,OPL选择现有动作集中的动作以最大化预期奖励。然而,在许多现实场景中,动作(如新闻文章或视频内容)会持续变化,且在数据收集后,动作空间会随时间演变。我们定义在部署日志策略后引入的动作为新动作,并专注于包含新动作的OPL。现有OPL方法能有效识别现有动作集中的最优动作,但无法学习和选择新动作,因为没有相关数据被记录。为解决这一限制,我们提出了一种新的OPL方法,利用动作特征。我们首先引入局部组合伪逆(LCPI)估计器用于策略梯度,扩展了最初为离线情境老虎机滑动评估提出的伪逆估计器。LCPI在奖励建模条件和数据收集条件之间控制动作特征的权衡,捕捉不同动作特征维度之间的交互效应。此外,我们提出了一种名为Policy Optimization for Effective New Actions(PONA)的通用算法,将专门用于新动作选择的LCPI组件与在现有动作中学习效果出色的双重稳健(DR)算法结合。我们定义PONA为LCPI和DR估计器的加权和,优化现有和新动作的选择,并允许通过权重参数调整新动作选择的比例。通过广泛的实验,我们证明PONA能够高效地选择新动作,同时保持整体策略性能,相较于大多数现有方法无法选择新动作。

英文摘要

Automated decision-making algorithms drive applications such as recommendation systems and search engines. These algorithms often rely on off-policy contextual bandits or off-policy learning (OPL). Conventionally, OPL selects actions that maximize the expected reward from an existing action set. However, in many real-world scenarios, actions, such as news articles or video content, change continuously, and the action space evolves over time after data collection. We define actions introduced after deploying the logging policy as new actions and focus on OPL with new actions. Existing OPL methods identify optimal actions from the existing set effectively but cannot learn and select new actions because no relevant data are logged. To address this limitation, we propose a new OPL method that leverages action features. We first introduce the Local Combination PseudoInverse (LCPI) estimator for the policy gradient, generalizing the PseudoInverse estimator initially proposed for off-policy evaluation of slate bandits. LCPI controls the trade-off between reward-modeling condition and the condition for data collection regarding the action features, capturing the interaction effects among different dimensions of action features. Furthermore, we propose a generalized algorithm called Policy Optimization for Effective New Actions (PONA), which integrates LCPI, a component specialized for new action selection, with Doubly Robust (DR), which excels at learning within existing actions. We define PONA as a weighted sum of the LCPI and DR estimators, optimizing both the selection of existing and new actions, and allowing the proportion of new action selections to be adjusted by the weight parameter. Through extensive experiments, we demonstrate that PONA efficiently selects new actions while maintaining the overall policy performance as opposed to most existing methods that cannot select new actions.

2605.18508 2026-05-19 cs.LG cs.AI

DiPRL: Learning Discrete Programmatic Policies via Architecture Entropy Regularization

DiPRL: 通过架构熵正则化学习离散程序性策略

Chengpeng Hu, Yingqian Zhang, Hendrik Baier

AI总结 本文提出DiPRL,一种通过架构熵正则化学习可解释程序性策略的方法,以避免事后细化阶段,提高策略表达性和任务性能。

详情
AI中文摘要

程序性强化学习(PRL)通过将策略表示为可读可编辑的程序,为深度强化学习提供了一种可解释的替代方案。尽管基于梯度的方法已被开发用于优化程序的连续松弛,但在将连续松弛转换回离散程序时会显著降低性能。事后离散化会丢弃优化的分支和参数,导致策略表达性崩溃和任务性能下降,从而需要额外的微调。为克服这些限制,我们提出了可微离散程序性强化学习(DiPRL),一种在训练过程中使程序接近离散的方法,避免了单独的事后微调阶段。我们首先分析了基于梯度方法事后离散化引入的性能下降固有风险。然后,我们引入了程序架构熵正则化,这使得训练过程平滑且可微,鼓励收敛到离散程序。DiPRL在保持基于梯度优化效率的同时,减轻了事后离散化的风险。在多个离散和连续RL任务中的实验表明,DiPRL可以通过可解释的程序性策略实现强大的性能。

英文摘要

Programmatic reinforcement learning (PRL) offers an interpretable alternative to deep reinforcement learning by representing policies as human-readable and -editable programs. While gradient-based methods have been developed to optimize continuous relaxations of programs, they face a significant performance drop when converting the continuous relaxations back into discrete programs. Post-hoc discretization can discard optimized branches and parameters in a program, which results in a collapse of policy expressivity and lowered task performance, leading in turn to a need for additional fine-tuning. To overcome these limitations, we propose Differentiable Discrete Programmatic Reinforcement Learning (DiPRL), a method that learns programmatic policies that become nearly discrete during training, avoiding a separate post-hoc fine-tuning stage. We first analyze the inherent risks of performance drop introduced by post-hoc discretization of gradient-based methods. Then, we introduce programmatic architecture entropy regularization, which enables smooth, differentiable training that encourages convergence toward a discrete program. DiPRL maintains the efficiency of gradient-based optimization while mitigating the risks of post-hoc discretization. Our experiments across multiple discrete and continuous RL tasks demonstrate that DiPRL can achieve strong performance via interpretable programmatic policies.

2605.18504 2026-05-19 cs.CL

Ancient Greek to Modern Greek Machine Translation: A Novel Benchmark and Fine-Tuning Experiments on LLMs and NMT Models

古希腊语到现代希腊语机器翻译:一种新的基准和对LLM和NMT模型的微调实验

Spyridon Mavromatis, Sokratis Sofianopoulos, Prokopis Prokopidis, Maria Giagkou

AI总结 本文提出了一种新的基准测试,并对LLM和NMT模型进行了微调实验,以解决古希腊语到现代希腊语的低资源机器翻译问题,展示了微调在提升翻译性能上的显著效果。

详情
Journal ref
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), pp. 8685-8698. European Language Resources Association (ELRA)
Comments
14 pages. Accepted for presentation at the 15th Language Resources and Evaluation Conference (LREC 2026), Palma, Mallorca, Spain
AI中文摘要

机器翻译(MT)在古希腊语(AG)到现代希腊语(MG)之间的任务是一个低资源任务,受到大规模高质量平行数据缺乏的限制。我们通过引入AG-MG平行语料库来填补这一空白,该语料库包含132,481个句子对,来源于文学、历史和圣经文本。我们提出了一种新的语料库创建流水线,结合了网络爬取的片段级数据和多阶段的句子级对齐和精修过程。我们的方法使用VecAlign与LaBSE嵌入,首先在手动对齐的AG-MG子集中进行微调,然后使用Gemini 2.5 Flash进行LLM基于的错误/对齐修正阶段,以确保高质量的对齐。此外,我们提供了对现代MT模型在该任务上的首次全面基准测试,评估了三种微调策略在NMT模型(NLLB、M2M100)和希腊LLM(Llama-Krikri-8B)上的表现。我们的实验表明,微调在基模型上带来了显著的性能提升,最高可增加10.3个BLEU分数。具体而言,Llama-Krikri-8B的全参数微调实现了最高整体性能,BLEU得分为13.16,而经过QLoRA调整的M2M100-1.2B模型展示了最大的相对增益和具有竞争力的结果。我们的数据集和模型对希腊NLP做出了重要贡献。

英文摘要

Machine Translation (MT) for Ancient Greek (AG) to Modern Greek (MG) is a low-resource task, constrained by the lack of large-scale, high-quality parallel data. We address this gap by introducing the AG-MG Parallel Corpus, a new resource containing 132,481 sentence-aligned pairs derived from literary, historical, and biblical texts. We present a novel corpus creation pipeline that combines web-scraped, excerpt-level data with a multi-stage sentence-level alignment, and refinement process. Our method uses VecAlign with LaBSE embeddings, which we first fine-tune on a manually-aligned AG-MG subset, followed by an LLM-based error/misalignment correction phase using Gemini 2.5 Flash to ensure high alignment quality. Furthermore, we provide the first comprehensive benchmark of modern MT models on this task, evaluating three fine-tuning strategies across NMT models (NLLB, M2M100) and a Greek LLM (Llama-Krikri-8B). Our experiments show that fine-tuning yields significant improvements over base models, increasing performance by up to +10.3 BLEU points. Specifically, full-parameter fine-tuning of Llama-Krikri-8B achieves the highest overall performance with a BLEU score of 13.16, while the QLoRA-adapted M2M100-1.2B model demonstrates the largest relative gains and highly competitive results. Our dataset and models represent a significant contribution to Greek NLP.

2605.18500 2026-05-19 cs.CL

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

隐式层次GRPO:将工具调用与执行解耦以实现工具集成的数学推理

Li Wang, Xiaohan Wang, Xiaodong Lu, Zipeng Zhang, Jinyang Wu, Jiajun Chai, Wei Lin, Guojun Yin

AI总结 本文提出了一种将工具调用与执行解耦的方法,通过引入延迟执行和显式控制来增强工具集成推理能力,并提出了一个分层控制框架和理论推导出的替代损失函数,从而得到隐式分层策略,最终提出IH-GRPO算法,在六个跨领域数学推理基准测试中,Qwen3-1.7B、Qwen3-4B和Qwen3-8B在最强基线方法上分别实现了1.87%、2.16%和2.53%的绝对提升。

详情
AI中文摘要

大型语言模型(LLMs)越来越多地利用工具调用来增强其推理能力。然而,现有方法通常紧密耦合工具调用与即时执行。这种即时工具交互可能会破坏LLMs的推理连贯性并限制其表达能力,最终降低推理性能。为此,我们首次提出并形式化了在推理过程中解耦工具调用与执行的问题,并引入延迟执行与显式控制以增强工具集成推理(TIR)。此外,我们提出了一种分层控制框架,并理论推导出一个替代损失函数,使隐式分层策略能够学习等同于显式分层策略的行为,从而得到所提出的IH-GRPO算法。在六个跨领域数学推理基准测试中,IH-GRPO在Qwen3-1.7B、Qwen3-4B和Qwen3-8B上分别实现了1.87%、2.16%和2.53%的绝对提升,同时在其他领域也产生了持续的性能提升。我们的代码可在https://github.com/Lumina04/IH-GRPO-01上获得。

英文摘要

Large language models (LLMs) have increasingly leveraged tool invocation to enhance their reasoning capabilities. However, existing approaches typically tightly couple tool invocation with immediate execution. Such immediate tool interaction may disrupt the reasoning coherence of LLMs and constrain their expressivity, ultimately degrading reasoning performance. To this end, for the first time, we propose and formalize the problem of decoupling tool invocation from execution during reasoning, and introduce delayed execution with explicit control to enhance tool-integrated reasoning (TIR). Furthermore, we propose a hierarchical control framework and theoretically derive a surrogate loss that enables an implicitly hierarchical policy to learn behavior equivalent to that of an explicit hierarchical policy, leading to the proposed IH-GRPO algorithm. Extensive experiments on IH-GRPO achieve absolute improvements of 1.87\%, 2.16\%, and 2.53\% on Qwen3-1.7B, Qwen3-4B, and Qwen3-8B across six out-of-domain mathematical reasoning benchmarks over the strongest baseline method, while also yielding consistent performance gains in other domains. Our code is available at https://github.com/Lumina04/IH-GRPO-01.

2605.18498 2026-05-19 cs.LG cs.AI

DBES: A Systematic Benchmark and Metric Suite for Evaluating Expert Specialization in Large-Scale MoEs

DBES: 一种用于评估大规模MoE模型专家专业化程度的系统性基准和度量套件

Jing Wang, Hongxuan Lu, Jazze Young, Shu Wang, Zhimin Xin

AI总结 本文提出DBES系统性基准和度量套件,通过多领域基准和五个理论基础的度量指标,评估MoE模型中的专家专业化程度,并验证这些度量指标在领域特定后训练中的可操作性,实现了显著的性能提升。

详情
AI中文摘要

MoE模型中的专家专业化仍缺乏深入理解,传统评估将架构负载均衡与功能专业化混淆。我们引入DBES,一种综合的诊断框架,结合多领域基准和五个理论基础的度量指标:路由专业化、归一化有效秩、领域隔离、路由刚度分数和n-gram专家度量。关键发现显示不同模型展现出不同的专业化范式:Qwen系列表现出模块化专业化,具有高领域隔离,而DeepSeek和GLM采用分布式协作。然而,我们强调专业化是诊断维度,必要但不充分用于下游性能。最重要的是,干预证据验证了这些度量指标的可操作性:通过使用DBES在领域特定后训练中识别高专业化专家路径,我们仅使用15%的原始训练资源,在专业化领域实现了66%至94.48%的性能提升,证明这些诊断工具可以转化为具体的优化算子。本文提供了首个系统性的方法,用于独立于准确度指标评估专家专业化,为下一代MoE系统的设计和后训练优化提供了关键见解。

英文摘要

Expert specialization in Mixture-of-Experts (MoE) models remains poorly understood, with traditional evaluations conflating architectural load-balancing with functional specialization. We introduce DBES, a comprehensive diagnostic framework combining a multi-domain benchmark with five theoretically grounded metrics: Routing Specialization, Normalized Effective Rank, Domain Isolation, Routing Stiffness Score, and N-gram Expertise measures. Critical findings demonstrate distinct specialization paradigms across models: Qwen-series exhibit modular specialization with high domain isolation, while DeepSeek and GLM employ distributed collaboration. However, we emphasize that specialization is a diagnostic dimension, necessary but not sufficient for downstream performance. Most crucially, interventional evidence validates the actionability of these metrics: by using DBES to identify high-specialization expert paths during domain-specific post-training, we achieved 66% to 94.48% improvement in specialized domains with only 15% of original training resources, demonstrating that these diagnostic tools can be converted into concrete optimization operators. This work provides the first systematic methodology for evaluating expert specialization independently of accuracy metrics, offering crucial insights for the design and post-training optimization of next-generation MoE systems.

2605.18491 2026-05-19 cs.CV

Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks

对SSL预训练在相同和不同模态分割任务中转移性的基准测试

Jue Jiang, Harini Veeraraghavan

AI总结 本文通过九种SSL方法在相同和不同模态的分割任务中进行基准测试,评估了预训练模型的迁移能力和效率,发现自蒸馏masked image transformer在分割精度、收敛速度和少量样本到大量样本的性能差距方面表现最佳。

详情
Comments
Paper submitted to Medical Physics for review
AI中文摘要

方法:九种覆盖四种预训练任务家族的SSL方法使用相同的10,412个3D CT扫描(1.89~M个2D轴向切片)从头开始预训练,这些扫描涵盖不同的疾病部位。每个方法的预训练Swin Transformer编码器被整合到SwinUNETR风格的分割网络中(Swin编码器与3D CNN解码器和跳跃连接),并在九个公开的分割任务上进行微调,包括大腹腔器官、头颈结构和CT和MRI中的肿瘤。性能通过Dice相似系数(DSC)评估。微调收敛速度、跨模态(CT到MRI)的迁移性以及少量样本和大量样本微调之间的特征重用模式进一步通过中心化核对齐分析。结果:自蒸馏masked image transformer(SMIT),结合masked image modeling(MIM)和局部和全局自蒸馏,在九个任务中实现了最高的分割精度、最快的微调收敛速度和最小的少量样本到大量样本性能差距,表明最强的数据效率。SMIT还显示了在少量样本和大量样本微调之间最一致的特征重用模式。基于MIM的SimMIM和自蒸馏方法(DINO、iBOT)优于依赖图像级全局表示的对比学习和旋转预测。SSL方法之间的差异在少量样本设置中最大,随着标记微调数据集大小的增加而缩小,表明在有限标注预算下SSL预训练的选择最为关键。

英文摘要

Methods: Nine SSL methods spanning four pretext-task families were pretrained from scratch using the same 10{,}412 3D CT scans (1.89~M 2D axial slices) covering varied disease sites. The pretrained Swin Transformer encoder from each method was integrated into a SwinUNETR-style segmentation network (Swin encoder with a 3D CNN decoder and skip connections) and fine-tuned on nine public segmentation tasks of varying complexity, including large abdominal organs, head-and-neck structures, and tumors from CT and MRI. Performance was assessed using Dice similarity coefficient (DSC). Fine-tuning convergence speed, transferability across modalities (CT-to-MRI), and feature-reuse patterns between few- and many-shot fine tuning were further analyzed using centered kernel alignment. Results: Self-distilled masked image transformer (SMIT), which combines masked image modeling (MIM) with local and global self-distillation, achieved the highest overall segmentation accuracy across the nine tasks, the fastest fine-tuning convergence, and the smallest few-shot-to-many-shot performance gap, indicating the strongest data efficiency. SMIT also showed the most consistent feature-reuse patterns between few- and many-shot fine tuning. MIM-based SimMIM and self-distillation methods (DINO, iBOT) outperformed contrastive learning and rotation prediction, which rely on image-level global representations. Differences between SSL methods were largest in the few-shot setting and narrowed as the size of the labeled fine-tuning dataset increased, indicating that the choice of SSL pretraining matters most under limited annotation budgets.

2605.18490 2026-05-19 cs.CL cs.IR

Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research

向量RAG与LLM编写的维基:在小型多领域研究上的预注册比较

Theodore O. Cochran

AI总结 本文通过预注册比较,研究了向量RAG系统与LLM编写的维基在小型多领域研究中的表现,发现两者在跨论文连接、答案组织和成本等方面各有优劣,没有单一系统在所有指标上最优。

详情
AI中文摘要

我们预注册了两种帮助LLM回答问题的方法在小型研究语料库上的比较:单轮向量RAG系统和LLM编写的markdown维基。两种系统使用相同的回答生成模型回答了24篇论文中的13个问题,其答案由盲审的LLM法官评分。维基在连接不同论文的发现方面表现更好,但其在答案组织方面的优势在法官调整后并不显著。RAG通过预注册测试满足了单事实查找问题的要求。干净的查询侧成本结果与预期的维基优势相悖:在测试设置下,维基使用了比RAG更多的查询令牌,因此无法通过更便宜的查询来回收前期构建成本。两个探索性分析改变了我们对结果的解读。首先,按主张层面的引用检查更倾向于维基:其引用的页面更常支持所陈述的精确主张,尽管RAG在整体可信度评分上表现更好。其次,基于分解的RAG变体在较低的LLM令牌成本下恢复了维基在跨论文综合方面的大部分优势,但未能恢复维基在按主张引用支持方面的优势。主要结论是,基于事实的研究综合并非单一能力。系统在组织证据、引用支持每个主张的能力以及运行成本方面可能有所不同。在本研究中,没有架构在所有三个指标上最优。

英文摘要

We preregistered a comparison of two ways to help an LLM answer questions over a small research corpus: a single-round Vector RAG system and an LLM-compiled markdown wiki. Both systems answered the same 13 questions over 24 papers using the same answer-generating model, and their answers were scored by blinded LLM judges. The wiki scored much better at connecting findings across papers, but its advantage in answer organization was not strong after judge adjustment. RAG met the preregistered test for single-fact lookup questions. The clean query-side cost result went against the expected wiki advantage: under the tested setup, the wiki used far more query tokens than RAG, so it could not recover any upfront build cost through cheaper queries. Two exploratory analyses changed how we interpret the result. First, claim-level citation checking favored the wiki: its cited pages more often supported the exact claims being made, even though RAG scored better on the overall groundedness rubric. Second, a decomposition-based RAG variant recovered most of the wiki's advantage on cross-paper synthesis at lower LLM-token cost, but it did not recover the wiki advantage in claim-by-claim citation support. The main conclusion is that grounded research synthesis is not a single capability. Systems can differ in how well they organize evidence, how well their citations support each claim, and how much they cost to run. In this study, no architecture was best on all three.

2605.18483 2026-05-19 cs.LG cs.AI

Modality vs. Morphology: A Framework for Time Series Classification for Biological Signals

模态与形态:生物信号时间序列分类的框架

Jordan Tschida, Matthew Yohe, Edward Kane, Gavin Jager, Emma J. Reid, Tony G. Allen, Mark Story, Leanne Thompson, Joe Hoskins, Brandon Schreiber, Stan Seiferth, Scott Dolvin, David Cornett

AI总结 本文提出了一种统一的形态-模态框架,通过分析生物信号的形态结构,揭示了如何影响模型设计和性能,强调形态对预处理和建模策略的重要性,并指出未来的工作方向包括形态数据增强和评估指标改进。

详情
AI中文摘要

生物信号时间序列分类(TSC)已从手工制作的模态特定方法发展为能够表示底层生理过程多样波形结构的深度架构(即形态)。本文综述介绍了一种统一的形态-模态框架,将波形结构与方法论设计连接起来,揭示了尖峰、爆发、振荡、慢漂移和层次节奏如何影响模型设计。通过分析脑电图、肌电图、心电图、脉搏波描记图以及眼动模态(电眼图、瞳孔测量、眼动追踪),本文展示了形态如何决定预处理和建模策略。整合这些生物信号的证据,该框架揭示形态而非模型类别最强烈地决定了性能和可解释性。这提供了深度模型在诱导偏见与底层波形动态一致时为何成功的原因。本文还识别了未来的工作,包括形态数据增强和评估指标改进以提高泛化能力。这些见解将形态意识建模定位为开发跨生物信号通用、可解释和生理意义的TSC模型的统一原则。

英文摘要

Time series classification (TSC) of biological signals has progressed from handcrafted, modality-specific approaches to deep architectures capable of representing the diverse waveform structures of underlying physiological processes (i.e., morphology). This review introduces a unified morphology--modality framework that connects waveform structure to a methodological design, revealing how spikes, bursts, oscillations, slow drift, and hierarchical rhythms inform model design. By analyzing electroencephalography, electromyography, electrocardiography, photoplethysmography, and ocular modalities (electrooculography, pupillometry, eye-tracking), the review demonstrates how morphology determines preprocessing and modeling strategies. Integrating evidence across these biological signals, the framework reveals that morphology, not model class, most strongly determines performance and interpretability. This provides insight into why deep models succeed when their inductive biases align with underlying waveform dynamics. This review also identifies future work including morphological data augmentation and evaluation metrics to improve generalization. Together, these insights position morphology-aware modeling as a unifying principle for developing generalizable, interpretable, and physiologically meaningful TSC models across biological signals.

2605.18482 2026-05-19 cs.RO

Bidirectional Optical sensors for Actuation Tracking (BOAT) in soft lattice systems

用于软格栅系统的双向光学传感器(BOAT)用于驱动跟踪

Petr Trunin, Carolina Gay, Anderson Brazil Nardin, Trevor Exley, Diana Cafiso, Lucia Beccai

AI总结 本文提出了一种基于椭球几何排列的双波导光学传感器(BOAT),用于监测软格栅结构的全局变形,特别是压缩和伸展,并通过实验验证了其在压力循环中的高重复性和可靠性。

详情
AI中文摘要

随着格栅结构在软机器人中的广泛应用,需要更先进的传感解决方案来监测其整体变形,特别是压缩和伸展。本文通过引入基于两个图案化波导的新型光学传感器来解决这一挑战。该双向光学传感器用于驱动跟踪(BOAT)与一个由嵌入式气动人工肌肉(PAM)驱动的格栅结构无缝共印制,并对其性能进行了评估。在PAM伸长或收缩时,嵌入的BOAT波导的弯曲会引起输出信号的变化,从而能够清楚地区分压缩和伸展状态。两种波导结构(通过表面图案化)和传感器化的格栅单元嵌入两个BOAT的设计均通过数值模拟得到支持。经过100次连续的压力循环(从+50 kPa到-40 kPa)的实验校准,显示出高度可重复的响应,使得能够可靠地区分伸展和压缩。最后,利用传感器反馈实现数字影子,使整个传感器化单元与其虚拟对应物持续同步。这些结果证明了BOAT在软格栅机器人系统变形监测中的强大和可靠作用。

英文摘要

The growing adoption of lattice-based structures in soft robotics creates a need for advanced sensing solutions capable of monitoring their global deformation, particularly compression and extension. In this work, we address this challenge by introducing a novel optical sensor based on two patterned waveguides arranged in an ellipsoidal geometry. This Bidirectional Optical sensor for Actuation Tracking (BOAT) is seamlessly co-printed with a lattice structure actuated by an embedded pneumatic artificial muscle (PAM), and its performance is assessed. During PAM elongation or contraction, the bending of the embedded BOAT waveguides induces output signal variations that enable a clear discrimination between compression and extension states. The designs of both each specific waveguide structure (by surface patterning) and of the sensorized lattice-based unit embedding two BOATs are supported by numerical simulations. Experimental calibration over 100 consecutive pressure cycles ranging from +50 kPa to $-$40 kPa demonstrates a highly repeatable response, allowing a reliable distinction between extension and compression. Finally, sensor feedback is used to implement a digital shadow, enabling continuous synchronization between the whole sensorized unit and its virtual counterpart. These results establish BOAT as a powerful and reliable approach for deformation monitoring in soft lattice-based robotic systems.

2605.18481 2026-05-19 cs.AI

OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models

OCCAM: 开集因果概念解释与本体诱导用于黑盒视觉模型

Chiara Maria Russo, Simone Carnemolla, Simone Palazzo, Daniela Giordano, Concetto Spampinato, Matteo Pennisi

AI总结 OCCAM通过开放集因果概念解释和本体诱导方法,提高黑盒视觉模型的可解释性,揭示概念间的因果关系和模型偏见。

详情
AI中文摘要

解释深度图像分类器的决策仍然具有挑战性,尤其是在黑盒设置中,模型内部不可访问。我们介绍了OCCAM,一种用于视觉模型开放集因果概念解释和本体诱导的框架。OCCAM以开放集的方式发现视觉概念,通过文本引导的分割进行局部化,并通过移除概念来测量类别置信度的变化,以估计每个概念的因果贡献。除了局部解释外,OCCAM跨数据集聚合干预证据,诱导出一个结构化的概念本体,该本体捕捉了分类器如何全局组织视觉概念。在本体上进行推理可以揭示概念之间的一致依赖关系,暴露潜在的因果关系,并揭示系统性的模型偏见。在Broden和ImageNet-S上多个分类器的实验表明,OCCAM在开放集黑盒设置中提高了解释质量,同时提供了比单图像归因方法更丰富的全局见解。

英文摘要

Interpreting the decisions of deep image classifiers remains challenging, particularly in black-box settings where model internals are inaccessible. We introduce OCCAM, a framework for open-set causal concept explanation and ontology induction in vision models. OCCAM discovers visual concepts in an open-set manner, localizes them via text-guided segmentation, and performs object-level interventions by removing concepts to measure changes in class confidence, estimating each concept's causal contribution. Beyond local explanations, OCCAM aggregates interventional evidence across a dataset to induce a structured concept ontology that captures how classifiers globally organize visual concepts. Reasoning over this ontology reveals consistent dependencies between concepts, exposes latent causal relations, and uncovers systematic model biases. Experiments on Broden and ImageNet-S across multiple classifiers show that OCCAM improves explanation quality in open-set black-box settings while providing richer global insight than per-image attribution methods.